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Abstract 

We consider stationary hidden Markov models with finite state space and nonparametric 
modeling of the emission distributions. It has remained unknown until very recently that 
such models are identifiable. In this paper, we propose a new penalized least-squares esti¬ 
mator for the emission distributions which is statistically optimal and practically tractable. 
We prove a non asymptotic oracle inequality for our nonparametric estimator of the emis¬ 
sion distributions. A consequence is that this new estimator is rate minimax adaptive up 
to a logarithmic term. Our methodology is based on projections of the emission distri¬ 
butions onto nested subspaces of increasing complexity. The popular spectral estimators 
are unable to achieve the optimal rate but may be used as initial points in our procedure. 
Simulations are given that show the improvement obtained when applying the least-squares 
minimization consecutively to the spectral estimation. 

Keywords: nonparametric estimation; hidden Markov models; minimax adaptive esti¬ 
mation; oracle inequality; penalized least-squares. 


1. Introduction 

1.1 Context and motivations 

Finite state space hidden Markov models (HMMs for short) are widely used to model data 
evolving in time and coming from heterogeneous populations. They seem to be reliable 
tools to model practical situations in a variety of applications such as economics, genomics, 
signal processing and image analysis, ecology, environment, speech recognition, to name 
but a few. From a statistical view point, finite state space HMMs are stochastic processes 
1 where (Xj)j> i is a Markov chain with finite state space and conditionally on 
{Xj)j> 1 the Yj’s are independent with a distribution depending only on Xj. The observations 
are Y\ : ,v = (Y\ ,..., Yv) and the associated states Apjy = (Xi,..., Ayr) are unobserved. The 
parameters of the model are the initial distribution, the transition matrix of the hidden chain, 
and the emission distributions of the observations, that is the probability distributions of the 
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Yj’s conditionally to Xj = x for all possible x’s. In this paper we shall consider stationary 
ergodic HMMs so that the initial distribution is the stationary distribution of the (ergodic) 
hidden Markov chain. 


Until very recently, asymptotic performances of estimators were proved only in the 
parametric setting (that is, with finitely many unknown parameters). Though, nonpara- 
metric methods for HMMs hav e been con si dered in applied_ papers, but with no theoreti- 


opn 

cal guarantees, see fo r instance ICouvreur and Couvreurl (I200C ) for v oice activity detection, 
Lambert et ah ( 2003) for climate st ate identification, Lefevre ( 20031 ) f or automatic speech 


recognition, IShang and Chanl (120091 ) fo r facial expressio n recognition, lYolant et al.l (120131 ) 
for methylation comparison of proteins, Yau et alj (120111 ) for copy number variants identifi¬ 
cation in DNA analysis. 

The preliminary obstacle to obtain theoretical results on general finite state space non- 
parametric HMMs was to understand when such models are indeed identifiable. Marginal 
distributions of finitely many observations are finite mixtures of products of the emission 
distributions. It is clear that identifiability can not be obtained based on the marginal dis¬ 
tribution of only one observation. It is needed, and it is enough, to consider the marginal 


distrib ution of at least three co n secut ive o bservations to get identifiability, see iGassiat et al. 
( 2015 ). following Allman et al. ( 2009f ) and Hsu et al. ( 2012 ). 


1.2 Contribution 


The aim of our paper is to propose a new approach to estimate nonparametric HMMs with 
a statistically optimal and practically tractable method. We obtain this way nonparametric 
estimators of the emission distributions that achieve the minimax rate of estimation in an 
adaptive setting. 

Our perspective is based on estimating the projections of the emission laws onto nested 
subspaces of increasing complexity. Our analysis encompasses any family of nested subspaces 
of Hilbert spaces and works with a large variety of models . In th is framework one could think 
to use the spectral estimators as proposed by iHsu et al.l (120121 ) ; lAnandkumar et al.l (120121 ) 
in the parametric framework, by extending them to the nonparametric framework. But a 
careful analysis of the tradeoff between sampling size and approximation complexity shows 


that they do not lead to rate optimal estimators of the emission densities, see[De Castro et al. 


(2015) for a formal statement and proof. This can be easily understood. Indeed, the spectral 
estimators of the emission densities are computed as functions of the empirical estimator of 
the marginal distribution of three consecutive observations on Y 3 (where y is the observation 
space), for which, roughly speaking, when y is a subset of M, the optimal rate is N~ s ^ 2s+3 \ 
N being the number of observations and s the smoothness of the emission densities. Thus 
the rate obtained this way for the emission densities is also 7\r _s /( 2s+3 ). But since those 
emission densities describe one dimensional random variables on y, one could hope to be 
able to obtain the sharper rate _/\r~ s /( 2s+1 b This is the rate we obtain, up to a log IV term, 
with our new method. Let us explain how it works. 


Using the HMM modeling, and using sieves for the emission densities on y, we propose 
a penalized least squares estimator in the model selection framework. We prove an oracle 
inequality for the L 2 -risk of the estimator of the density of (Yi,Y 2 ,Ys), see Theorem [4j Since 
the complexity of the model is that given by the sieves for the emission densities, this leads, 
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up to a logiV term, to the adaptive minimax rate computed as for the density of only one 
observation Yj though we estimate the density of (Yi, Y 2 , Y 3 ). Roughly speaking, when the 
observations are one dimensional, that is when y is a subset of M, the obtained rate for the 
density of (Yi, 12 , 13 ) is of order N~ s ^ 2s+l ^ up to a log-ZV term, N being the number of 
observations and s the smoothness of the emission densities. 

The key point is then to be able to go back to the emission densities. This is the 
cornerstone of our main result. We prove in Theorem [6] that, under the assumption [HD] 
defined in Section 14.21 the quadratic risk for the density of (Y\, 12 , 13 ) is lower bounded 
by some positive constant multiplied by the quadratic risk for the emission densities. This 
technical assumption is generically satisfied in the sense that it holds for all possible emission 
densities for which the L 2 -norms and Hilbert dot products do not lie on a particular algebraic 
surface with coefficients depending on the transition matrix of the hidden chain. Moreover, 
we prove that, when the number of hidden states equals two, this assumption is always 
verified when the two emission densities are distinct, see Lemma [5] 

Our methodology requires that we have a preliminary estimator of the transition matrix. 
To get such an estimator, it is possible to use spectral methods. Thus our approach is the 
following. First, get a preliminary estimator of the initial distribution and the transition 
matrix of the hidden chain. Second, apply penalized least squares estimation on the density 
of three consecutive observations, using HMM modeling, model selection on the emission 
densities, and initial distribution and stationary matrix of the hidden chain set at the esti¬ 
mated value. This gives emission density estimators which have minimax adaptive rate, as 
our main result states, see Theorem [7] A simplified version of this theorem can be given as 
follows. 


Theorem 1 Assume (Yj)j>i is a hidden Markov model on M, with latent Markov chain 
(Xj)j>i with K possible values and true transition matrix Q*. Denote ff the density ofY n 
given X n = k, for k = 1 ,K. Assume the true transition matrix Q* is full rank and 
the true emission densities ft, k = 1 ,...,K are linearly independent, with smoothness s. 
Assume that [HD] holds true. Then, up to label switching, for N the number of observations 
large enough, the estimators Q, f^, k = 1,... ,K built in Section [3 and\5\ satisfy 


E 

E 


IQ* — Q || 2 

wfk-hwi 



k = 1,... ,K. 


Moreover, since the family of sieves we consider is that given by finite dimensional spaces 
described by an orthonormal basis, we are able to use the spectral estimators of the coeffi¬ 
cients of the densities as initial points in the least squares minimization. This is important 
since, in the HMM framework, least squares minimization does not have an explicit solution 
and may lead to several local minima. However, since the spectral estimates are proved to 
be consistent, we may be confident that their use as initial point is enough. Simulations 
indeed confirm this point. 

To conclude we claim that our results support a powerful new approach to estimate, for 
the first time, nonparametric HMMs with a statistically optimal and practically tractable 
method. 
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1.3 Related works 


The papers lAllman et al.l (|2009f ). iHsu et al.l ()2012l ) and I Anandkumar et al. (2012) pav ed the 


way to obtain identifiability under reasonable assumptions. In Anandkuma r et al.l (2012) 


the authors point out a structural link between multivariate mixtures with conditionally 
independent observations and finite state space HMMs. In Hsu et al. (120121 ) the authors 
propose a spectral method to estimate all parameters for finite state space HMMs (with 
finitely many observations), under the assumption that the transition matrix of the hidden 
chain is non singular, and that the (finitely valued) emission distributions are linearly inde¬ 
pendent. Extension to emission distributions on any space, under the linear independence 
assumptions (and keeping the assumption of non singularity of the transition matrix), al¬ 
lowed to prove the general identifiability result for finite state space HMMs, see Gassiat et al. 
(120151 ). where also model selection likelihood methods and nonparametric kernel methods 
are proposed to get nonparametric estimators. Let us notice also Vernetl ( 20151 ) that proves 
theoretical consistency of the posterior in nonpar ametric Bayesian methods for finite state 
space HMMs with adequate assumptions. Later, [Alexandrovich and Holzmann ( 2014 ) ob¬ 
tained identifiability when the emission distributions are all distinct (not necessarily linearly 
independent) and still when the transition matrix of_the hidden chain is full rank. In the 
nonparametric multivariate mixture model, Song et al. (201 4) prove that any linear func¬ 
tional of the emission distributions may be estimated with parametric rate of convergence in 
the context of reproducing kernel Hilbert spaces. The latt er us es spectral methods, not th e 


same but similar to the ones proposed in IHsu et al.l (120121 ) and I Anandkumar et al.l (120121 ) . 
Recen t papers t h at c ontain theoretical results on different kinds of nonparametric HMMs 


are 


Gassiat and Rousseau, where the emitted distributions are translated versions of each 


Dumont and Le Corffl in which the authors consider regression models with hid- 


other, and 

den regressor variables that can be Markovian on a continuous state space. 


1.4 Outline of the paper 

In Section [2j we set the notations, the model we shall study, and the assumptions we shall 
consider. We then state an identifiability lemma (see Lemma [31) that will be useful for our 
estimation method. In Sections [3] and [4] we give our main results. We explain the penalized 
least-squares estimation method in Section [3j and we prove in Section [4] that, when the 
transition matrix is irreducible and aperiodic, when the emission distributions are linearly 
independent and the penalty is adequately chosen, then, under a technical assumption, the 
penalized least squares estimator is asymptotically minimax adaptive up to a log N term, see 
Theorem |T] and Corollary [121 For this, we first prove an oracle inequality for the estimation 
of the density of (Yi, Y^Y-j), see Theorem [U then we prove the key result relating the risk 
of the density of (Yi, 12 , 13 ) to that of the emission densities, see Theorem [6] The latter 
holds under a technical assumption which we prove to be always verified in case K = 2, 
see Lemma [5] Finally, we need the performances of the spectral estimator of the transition 
matrix and_of the stati ona ry dis tribution which are given in Section [5j see Theorem HU 
proved in De Castro et al. ( 20151 ). We finally present simulations in Section [6] to illustrate 
our theoretical results. Those simulations show in particular the improvement obtained when 
applying the least-squares minimization consecutively to the spectral estimation. Detailed 
proofs are given in Section [8] 
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2. Notations and assumptions 

2.1 Nonparametric hidden Markov model 

Let K , D be positive integers and let C D be the Lebesgue measure on M d . Denote by X 
the set {1,..., K} of hidden states, y C the observation space, and Ak the space of 
probability measures on X identified to the (K — l)-dimensional simplex. Let (A n ) n >i be 
a Markov chain on X with K X K transition matrix Q* and initial distribution it * G A#. 
Let (Y n ) n > l be a sequence of observed random variables on y. Assume that, conditional on 
(X n ) n >i, the observations (Y n ) n >i are independent and, for all n G N, the distribution of 
Y n depends only on X n . Denote by /jf, the conditional law of Y n conditional on {X n = k}, 
and assume that p k has density ft with respect to the measure CP on y-. 

Vfcedf, d,4 = . 

Denote by 5* := {/^,..., the set of emission densities with respect to the Lebesgue 
measure. Then, for any integer n, the distribution of (Y \,..., Y n ) has density with respect 
to (£°)® n 

K 

k 2 )... Q *(k n - u k n )f kl (yi) • • • ft(y n ). 

k\ ,...,kn = 1 

We shall denote g* the density of (Yi, Y 2 ,Y ^). 

In this paper we shall address two observations schemes. We shall consider N i.i.d. 
samples (Y± s \ Yj s \ of three consecutive observations (Scenario A) or consecutive 

observations of the same chain (Scenario B): 

Vs g {i,..., n}, (y} s \ t 2 (s) , y 3 (s) ) := (Y s , Y s+ 1 ,Y s+2 ) . 

2.2 Projections of the population joint laws 

Denote by (L 1 (y,C D ), ||- H 2 ) the Hilbert space of square integrable functions on y with re¬ 
spect to the Lebesgue measure CP equipped with the usual inner product (•, •) on L J (3L C D ). 
Assume 5* C L 2 (y,C D ). 

Let (M r ) r > 1 be an increasing sequence of integers, and let (y$M r )r> 1 be a sequence 
of nested subspaces with dimension M r such that their union is dense in L 2 (T, C D ). Let 
<La l r '■= Wi, ■ ■ ■, £M r } be an orthonormal basis of y$M r - Recall that for all / G L 2 (y, C D ), 

M r 

lim V] (/, <Pm)<Pm = f , (1) 

r —>00 L —' 
m= 1 

in L 2 (y,C D ). Note that changing M r may change all functions ip m . 1 < m < M r in the 
basis d> M r ■ which we shall not indicate in the notation for sake of readability. Also, we drop 
the dependence on r and write M instead of M r . Define the projection of the emission laws 
onto y$ M by 

M 

VA; G X, := (/fe> ■ 

m= 1 

We shall write fj f := • • •, fl^ K ) and f* := (ff ,..., f* K ) throughout this paper. 
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Remark 2 One can consider the following standard examples: 

(Spline) The space of piecewise polynomials of degree bounded by d r based on the regular parti¬ 
tion with p® regular pieces on y = [0,1]'°. It holds that M r = (d r + 1 ) D p®. 

(Trig.) The space of real trigonometric polynomials on y = [0, 1] D with degree less than r. It 


holds that M r = (2 r + 1) . 


(Wav.) A wavelet basis &M r of scale r on y = [0,1] , see lMeven 1(1992) . In this case, it holds 


that M r = 2( r+1 ) D . 


2.3 Assumptions 

We shall use the following assumptions on the hidden chain. 

[HI] The transition matrix Q* has full rank, 

[H2] The Markov chain {X n ) n >\ is irreducible and aperiodic, 

[H3] The initial distribution n* = (7rJ,... ,M K ) is the stationary distribution. 

Notice that under [HI], [H2] and [H3], one has for all k € X, 7 r£ > ^min > 0- We shall use 
the following assumption on the emission densities. 


[H4] The family of emission densities 5* := {/*,..., ff^} is linearly independent. 


Those assumptions appear in spectral methods, see for i nstancelHsu et al.l (l2012f) :lAn andku mar et al. 
( 2012 1. and in identifiability issues, see for instance Allman et al. ( 2009h : Gassiat et al 
(1201511 . 


2.4 Identifiability lemma 

For any f = (f\,..., fx) € (L 2 (T, C D )) K and any transition matrix Q, denote by g : 
T 3 —>• K the function given by 

K 

g Q,f ivi, 92,93) = Y 7 r(£h)Q(&i, k 2 )Q(k 2 , h)fk 1 (yi)fk 2 {y2)fk 3 (.y3), ( 2 ) 

ki,k2,k3=l 

where ir is the stationary distribution of Q. When Q = Q* and f = f*, we get g^* ^ = g*. 
When fi,..., fit are probability densities on y. g**’* is the probability distribution of three 
consecutive observations of a stationary HMM. We now state a lemma that gathers all what 
we need about identifiability. 

For any transition matrix Q, let Tq be the set of permutations r such that for all i and j, 
Q(r(i),r(j)) = Q (i,j). The permutations in Tq describe how the states of the Markov chain 
may be permuted without changing the distribution of the whole chain: for any r in Tq, 
(r(X n )) n >i has the same distribution as (X n ) n >i. Since the hidden chain is not observed, 
if the emission distributions are permuted using r, we get the same HMM. In other words, 
if f r = (,/V( i),..., f T (K) ) ) then = g®’ { ■ Since identifiability up to permutation of the 

hidden states is obtained from the marginal distribution of three consecutive observations, 
we get the following lemma whose detailed proof is given in Section 18.11 
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Lemma 3 Assume that Q is a transition matrix for which [HI] and [H2] hold. Assume 
that [H4] holds. Then for any h E (L 2 (Y, C D )) K , 


gQ,f*+h = gQ r 


3t E To such that h~ = f* (j) - /*, j = 1,..., K. 


IQ oaoio UIU.L Uj J T y) 

In particular, if Tq reduces to the identity permutation, 

g Q,f*+h = g Q,f* h = (0,... , 0). 


3. The penalized least-squares estimator 

In this section we shall estimate the emission densities using the so-called penalized least 
squares method. Here, the least squares adjustment is made on the density g * of (Yj, Y 2 , 13 ). 
Starting from the operator T : t 1 —> ||i — g*|| 2 — \\g *\\2 = Pill — 2 f tg* which is minimal 
for the target g*, we introduce the corresponding empirical contrast 77 V. Namely, for any 
t E L 2 (T 3 ,T D ® 3 ), set 

2 N 

TN(t) = \\t\\l~ jj^t(Z s ), 

V s=l 

with Z s := (Y^ s \Y 2 S \y^) (Scenario A) or Z s := (Y s ,Y s+ i, Ys+ 2 ) (Scenario B). As 
N tends to infinity, 77 V (t) — 77 V (g *) converges almost surely to ||t — g* p, thus the name 
least squares contrast function. A natural estimator is then a function t such that 77 V (t) 
is minimal over a judicious approximation space which is a set of functions of form , 
Q a transition matrix and f € J ~ K , for T a subset of L 2 {y,C D ). We thus define a whole 
collection of estimates (jm , each M indexing an approximation subspace (also called model). 
Considering (j2]) we shall introduce a collection of model of functions by projection of possible 
f’s on the subspaces Thus, for any irreducible transition matrix Q with stationary 

distribution 7 r, we define 5(Q,M) as the set of functions g^’ f such that f € J~ K and, for 
each k = 1,..., K, there exists (a mi fc)i<m<M € such that 

M 

fk = ^ | 

m =1 


We now assume that we have in hand an estimator Q of Q*. For instance, one can use 
a spectral estimator, we recall such a construction in Section [5] Then, (S(Q, M))m is 
the collection of models we use for the least squares minimization. For any M, define 
c/m as a minimizer of 7 at(£) for t E Then cjm can be written as c/m = g^^ M 

with f M G T K and f M ,k = (k = for some (a m! fc)i< m <Af E M M , 

k = 1,..., K. It then remains to select the best model, that is to choose M which minimizes 
|| c/m ~ S'*!!! — IlfPIl!- This quantity is close to 7 tv(<?m)) but we need to take into account 
the deviations of the process F — 77 V. Then we rather minimize 7 tv(<?m) + pen (A, M ) where 
pen(A, M ) is a penalty term to be specified. Our final estimator will be a penalized least 
squares estimator. For this purpose we choose a penalty function pen(iV, M) and we let 

M = arg ^ min ^ { 7 jv(sm) + pen(A, M)} . 
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Notice that, with N observations, we consider N subspaces as candidates for model selec¬ 
tion. Then the estimator of g* is g = and the estimator of f* is f := f ,^ so that g = . 


The least squares estimator does not have an explicit form such as in usual nonparametric 
estimation, so that one has to use numerical minimization algorithms. As initial point of the 
minimization algorithm, we shall use the spectral est imator, see Section [ 6 ] for more details. 
Since the spectral estimator is consistent, 
not suffer from initialization problems. 


see 


De Castro et al. (2015), the algorithm does 


4. Adaptive estimation of the emission distributions 
4.1 Oracle inequality for the estimation of g * 

We now fix a subset T of L 2 (y,C D ), and we shall use the following assumption: 

[HF] T is a closed subset ofL 2 (y,C D ) such that: for any f E T, J fdC D = 1, \\fW 2 < Cjp, 
and ||/|| 00 < C(f j00 for some fixed positive Cjr ,2 and Cj ^ 0c . 

Our first main result is an oracle inequality for the estimation of g * which is stated below 
and proved in Section [8.21 We denote by 6 k the set of permutations of {1 ,,K}. When 
a is a vector, ||a ||2 denotes its Euclidian norm, and when A is a matrix, ||A||i? denotes its 
Frobenius norm. 


Theorem 4 Assume [H1]-[H4] and [HF], Assume also f* £ J- K , and for all M, f^ € J- K . 
Then, there exists positive constants No, p* and A* (depending on Cjp and Cj^ t00 (Scenario 
B) or on Q*> c T , 2 and Cp.oo (Scenario A)) such that, if 


pen (N, M) > p* 


M log N 
N 


then for all x > 0, for all N > No, one has with probability 1 — (e — 1) l e x , for any 
permutation r € &k, 


9 - 9*111 < 0 inf {||ff*- 5 Q *’ f ^ll 2 + Pen(A,M)} + 

+18Cj: i2 (2||Q* - F T Q N Fj Hi + ||tt* - P r 7r|||). 


Here, P r is the permutation matrix associated to r. 


The important fact in this oracle inequality is that the minimal possible penalty is of 
order M/N (up to logarithmic terms) and not M 3 /N as is usually the case when estimating 
a joint density of three random variables, so that we get a minimax rate adaptive estimator 
of the joint density g*. 


4.2 Main result 

The problem is now to deduce from Theorem [4] a result on ||/£ — /fcHl k = 1,,K. This is 
the cornerstone of our work: we prove that, under a technical assumption on the parameters 
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of the unknown HMM, a direct lower bound links \\g — g *\\2 to Y^k=i ll/i — fkW^ U P t° some 
positive constant. Let us now describe the assumption and comment on its genericity. 

For any f € J- K , define G(f) the K x K matrix with coefficients G(f) t j = 
i,j = 1 Notice that under the assumption [H4], G(f*) is positive dehnite. Let Q 

be a transition matrix verifying [H1]-[H2] and let Aq be the diagonal matrix having the 
stationary distribution n of Q on the diagonal. We shall now define a quadratic form with 
coefficients depending on Q and G'(f). If U is a K x K matrix such that UIk = 0, 

K 

V '-= E {(Q T A Q UG(f)U T A Q Q) iJ (G(i)) iJ {QG(f)Q T ) iJ 

i,j =1 

+ (Q T A Q G(f)A Q Q). j {UG(f)U T ) ij ,(QG(f)Q r ). . 

+ (Q T ^ Q G(f)^Q). j (G(f)). j (QC/G(f)C/ T Q T ). .} 
+ 2 ^{(Q T ^C/G(f)^Q). j .(C/G(f)) j ..(QG(f)Q T ). . 

+ (Ct r A Q UG(f)A Q q).' ,(Q[/G(f)Q T )..(G(f)). . 

+ (C/G(f)). j (Qt/G(f)Q T ) j . i (Q T ^G(f)^ Q Q). j .} 

defines a semidefinite positive quadratic form T> in the coefficients U t j. i = 1 ,..., K , j = 
1 ,K — 1. The determinant of this quadratic form is a polynomial in the coefficients 
of the matrices Q, Aq and G(f). Since the coefficients of Aq are rational functions of the 
coefficients of the matrix Q, this determinant is also a rational function of the coefficients 
of the matrices Q and G(f). Define iL(Q,G(f)) the numerator of the determinant. Then 
H( Q, G(f)) is a polynomial in the coefficients of the matrices Q and G(f). Our assumption 
will be: 


[HD] ff(Q*, G(f*)) / 0. 


Since H is a polynomial function of Q*j, i = 1 j = 1,..., K — 1, and (/*,/*), 

i,j = 1,..., K, the assumption [HD] is generically satisfied. We have been able to prove 
that [HD] always holds in the case K = 2. We were only able to prove this result by direct 
computation, it is given in Section 18.41 

Lemma 5 Assume K = 2. Then for all Q* and f* such that [H1MH41 hold, one has 
if(Q*,G(f*)) > 0. 

Notice now that, when [HD] and [H1]-[H3] hold, it is possible to define a compact neigh¬ 
borhood V of Q* such that, for all Q € V, tf(Q,G(f*)) + 0, [H1]-[H3] hold for Q and 
Tq C Tq*. 

For any h € (L 2 (T, C d )) K , dehne ||h||^ := miri rGTq {^f =] \\h k + f k - |||}- Denote 

||h||! := {X^fcLi ll^fcllll- ma y now state the theorem which is the cornerstone of our 
main result. 
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Theorem 6 Assume [H1]-[H4] and [HD], Let 1C be a closed bounded subset of (L 2 (T, £ D )) K 
such that if h £ 1C, then f hidC D = 0, i = 1,..., K . Let V be a compact neighborhood of Q* 
such that, for all Q € V, i/(Q,G(f*)) ^ 0, [H1]-[H3] holds for Q and Tq C Tq*. Then 
there exists a positive constant c(/C, V,^*) such that 

Vh £ 1C, V Q £ V, HgQT+h _ g Q,f* h > c (/C, V,r)||h|| Q *. 

This theorem is proved in Section [8.31 

We are now ready to prove our main result on the penalized least squares estimator of 
the emission densities. The following theorem gives an oracle inequality for the estimators 
of the emission distributions provided the penalty is adequately chosen. It is proved in 
Section 18.51 


Theorem 7 (Adaptive estimation) Assume [H1]-[H4], [HF] and [HD], Assume also 
that for all M, f^- £ J- K . Let V be a compact neighborhood of Q* such that, for all Q £ V, 
i7(Q,G(f*)) ^ 0 and [H1]-[H3] holds for Q. Then, there exists a positive constant A* 
(depending onV, f*, Cjr 2 and CV,oo) and positive constants Nq and p* (depending on Cjp 
and Cqq (Scenario A) or on Q*, c T . 2 and Cqq (Scenario B)) such that, if 


pen (N, M) > p* 


M log IV 
N 


then for all x > 0, for all N > Nq, for any permutation tv £ &k, with probability larger 
than 1 — (e — 1 )~ 1 e~ x — P ^P Tjv QP^ ^ there exists r € Tq* such that 


K 


K 


J2 II /*(*) - fr N (k) III < A *[ in f { ll/fc “ /M.felll + pen(A, M) 


k =1 


,k =1 


+ ||Q* — P Tjv QP^ ||f+|K* — ^jv^llid- 


x 

N J 


Remark 8 As usual in HMM or mixture model, it is only possible to estimate the model up 
to label switching of the hidden states, this is the meaning of the permutation tn- 


Remark 9 An important consequence of the theorem is that a right choice of the penalty 
leads to a rate minimax adaptive estimator up to a log N term, see Corollary [73 below. 
For this purpose, one has to choose an estimator Q of Q* which is, up to label switching, 
consistent with controlled rate. One possible choice is a spectral estimator. 

To apply Theorem [3 one has to choose an estimator Q with controlled behavior, to be 
able to evaluate the probability of the event {P Tjv QP Tjv £ V} and the rate of convergence 
of P rjv QP rjv and P Tj y 7 r. One possibility is to use the spectral estimator described in Section 
[5] To get the following result (proved in Section 18.61) , we propose to use the spectral esti¬ 
mator with, for each N, the dimension Mjv chosen such that %= 0((log A) 1 / 4 ), see 
Section [5] for a definition of 773 . 
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Corollary 10 With this choice of Q, under the assumptions of Theorem Q . there exists a 
sequence of permutations tjv € &k such that as N tends to infinity, 


E 


K 


^2\\fk -fr N {k)\\l 


Lfc=l 


K 


= O 


inf 

M ' 


V. II ft ~ /M',felll + pen (N,M') > + 


i 


log N 
N 


Thus, choosing pen(A r , M) = pM log N / N for a large p leads to the minimax asymptotic 
rate of con verg e nce up to a power of log N. Indeed, standard results in approximation 
theory (see DeVore and Lorent d ( 1993f ) for instance) show that one can upper bound the 
approximation error || ff — /,. 112 by 0 (M _ o) where s > 0 denotes a regularity parameter. 

Then the trade-off is obtained for ~ (N/ log N) 2s + D , which leads to the quasi-optimal 
rate {N/ log N ) 2a + D for the nonparametric estimation when the minimal smoothness of the 
emission densities is s. Notice that the algorithm automatically selects the best M leading 
to this rate. 

To implement the estimator, it remains to choose a value for p in the penalty. The 
calibration of this parameter is a classical issue an d coul d be the subject of a full paper. In 
practice one can use the slope heuristic as in Baudry et al. (20121). 


5. Nonparametric spectral method 

This section is devoted to a short description of the nonparametric spectral method for sake 
of completeness: we describe the algorithm, and give the results we need to support the use 
of spectral estimators to initializ e o ur algorithm. A detailed study of the nonparametric 
spectral method is given in De Castro et al.l ( 2015 ). 

The following procedure (see Algorithm [T]) describes a tractable approach to estimate the 
transition matrix in a way that can be used for the penalized least squares estimator of the 
emission densities, and also for the estimation of the projections of the emission densities 
that may be used to initialize the least squares algorithm. The procedure is based on 
recent developments in parametric estimation of HMMs. For each fixed M, we estimate the 
proj ecti on of the e mission distributions on the basis using the spectral method proposed 


m 


Anandkumar et al. (20121). As the authors of the latter paper explain, this allows further 


to estimate the transition matrix (we use a modified version of their estimator), and we set 
the estimator of the stationary distribution as the stationary distribution of the estimator 
of the transition matrix. The computation of those estimators is particularly simple: it 
is based on one SVD, some matrix inversions and one diagonalization. One can prove, 
with overwhelming p robab ility, all matrix inversions and the diagonalization can be done 
rightfully, see De Castro et al. (20.151). In the following, when A is a (p x q) matrix with 
p > q, A t denotes the transpose matrix of A, A(k, l ) its ( k , £)th entry, A{. , l ) its Zth column 
and A(k,. ) its fcth line. When v is a vector of size p, we denote by £>iag[u] the diagonal 
matrix with diagonal entries Vi and, by abuse of notation, SDiag[u] = 2)iag[u T ]. 

We now state a result which allows to derive the asymptotic properties of the spectral 
estimators. Let us define: 

M 

vi ($m) := sup ^ (p a {yi)p h (y 2 )y c {yz) ~ Va(y'i)Tb(y'2)Vc{y'3)f ■ 

ViV'cy 3 a,b,c= 1 
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Algorithm 1: Nonparametric spectral estimation of HMMs 
Data: An observed chain (Yi,..., Y/v) and a number of hidden states K. 
Result: Spectral estimators if, Q and ( fM,k)keX■ 


[Step 1] 

[Step 2] 

[Step 3] 
[Step 4] 

[Step 5] 

[Step 6] 
[Step 7] 

[Step 8] 
[Step 9] 


Consider the following empirical estimators: For any a, b, c in {1,..., M}, 

Lm(«) := := iEiti V.WDwfYVcfiY) 


Nm(o, 6) := iEiLi W.Y'VfL'’), Pm(o,c) := iEf=i%« w )%(E w ). 


4 s ) 






Let U be the M x J\ matrix of orthonormal right singular vectors of P m 
corresponding to its top K singular values. 

Form the matrices for all b £ {1,..., M}, B(6) := (U t PmU) _1 U t Mji/(. , b ,. )U. 


Set 0 a (K x K ) random unitary matrix uniformly drawn and form the matrices 
for all k€{l,...,K}, C(k) := E£i(U0)(M)B(&). 

Compute R a (A' x A') unit Euclidean norm columns matrix that diagonalizes 
the matrix C(l): R -1 C(1)R = 2Hag[(A(l, 1),... , A(l, A'))]. 

Set for all k,k' € A, A (k,k') := (R.~ 1 C(k)TV)(k l , k') and Om := U0A. 

Consider the emission laws estimator f := ( fM,k)keX defined by for all & £ A, 
fM,k := X]m= 1 k)tpm- 

Set 7 f := (U T 6 M ) _ 1 U T LAf. 

Consider the transition matrix estimator: 


Q :=n T M((U T 6 M Sia0[ff]) 1 U T N M U(6I / U) , 

where IItm denotes the projection onto the convex set of transition matrices, 
and define if as the stationary distribution of Q. 


Note that in the examples (Spline), (Trig.) and (Wav.) we have: 


%($m) < C V M 2 


where C v > 0 is a constant. The following theorem is proved in De Castro et al. ( 120151 ). 


Its statement co ncern s (Scenario B) (same chain sampling) and the interested reader may 
consult De Castro et ahj (2015) for its statement under (Scenario A). 


Theorem 11 (Spectral estimators) Assume that [H1]-[H4] hold. Then, there exist pos¬ 
itive constant numbers Mg*, x(Q*), C(Q*,?*) and N(Q* ,^*) such that the following holds. 
For any x > x(Q*), for any 6 £ (0,1), for any M > Mg*, there exists a permuta¬ 
tion tm € &k such that the spectral method estimators fM,k> ff and Q satisfy: For any 
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N > N(Q*,5 *)%(^m) 2 2 ; (—log<5)/5 2 ; with probability greater than 1 — 25 — 4e x , 


ii fh,k - f m,t m ( fc) i i 2 < c( 


7 r*-P TM 7r|| 2 <C(Q*,r 


|Q*-P Tm QP; m || <C(Q*,r 


5 y/N 
V- log 5i]3{$m) 

6 y/N 

V- log6p 3 (^ M ) 
<5 y/N 


yfx . 


6. Numerical experiments 
6.1 General description 

In this section we present the numerical performances of our method. We recall that the 
experimenter knows nothing about the underlying hidden Markov model but the number 
of hidden states K. In this set of experiments, we consider the regular histogram basis or 
the trigonometric basis for estimating emission laws given by beta laws from a single chain 
observation of length N = 5e4. 

Our procedure is based on the computation of the empirical least squares estimators 
c/m defined as minimizers of the empirical contrast 7 at on the space <S(Q, M) where Q is 
an estimator of the transition matrix (for instance the spectral estimation of the transition 
matrix). Since the function 77V is non-convex, we use a second order approach estimating 
a positive definite matrix (using a covariance matrix) within an ite rative procedu re called 
CMAES for Covariance Matrix Adaptation Evolution Strategy, see [Hansen ( 20061 1. Using 
this latter algorithm, we search for the minimum of 77V with starting point the spectral 
estimation of the emission laws. 

Then, we estimate the size of the model thanks to 


M(p) € arg min 


7 n(9m) + P 


M log N 
N 


(3) 


where the penalty term p has to be tuned and the maximum size of the model M max can be 
set by the experimenter in a data-driven procedure. 

Indeed, we shall apply the s lope h euristic to adjust the penalty term and to choose M max . 
As presented in Baudrv et al.l ( 2012 ). the minimum contrast function M 1 —> jn(9m) should 
have a linear behavior for large values of M. The experimenter has to consider M max large 
enough in order to observe this linear stabilization, as depicted in Figure [2] The slope of the 
linear interpolation is then (p/2) log N/N (recall that the sample size N is fixed here) where 
p is the slope heuristic choice on how p should be tuned. Another procedure (theoretically 
equivalent) consists in plotting the function p 1—>- M(p) which is a non-increasing piecewise 
constant function. The estimated p is such that the largest drop (called “dimension jump”) 
of this function occurs at point p/2. We illustrate this procedure in Figure [3] where one can 
clearly point the jump and deduce the size M. 

To summarize, our procedure reads as follows. 

1. For all M < M max , compute the spectral estimations (Q,7 t) of the transition matrix 
and its stationary distribution and the spectral estimation f of the emission laws. This 
is straightforward using the procedure described by [Stepl-9] in Section [5] 
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2. For all M < M max , compute a minimum gM of the empirical con trast function 77V 


using “Covariance Matrix Adaptation Evolution Strategy”, see Hansen (2006). Use 


the estimation f of the spectral method as a starting point of CM AES. 


3. Tune the penalty term using the slope heuristic procedure and select M. 

4. Return the emission laws of the solution of point (2) for M = M. 


Note that the size M of the projection space for the spectral estimator has been set as the 
one chosen by the slope heuristic for the empirical least squares estimators. 

All the codes of the numerical experiments are available at https : //mycore. core-cloud.net/public. php? 
We shall indicate that the slope heuri stic has been done using CAPUSHE, the Matlab graph¬ 
ical user interface presented in B audr v et ah (1201 21. 


6.2 Complexity 

A crucial step of our method lies in computing the empirical least squares estimators c/m- 
One may struggle to compute since the function 77V is non-convex. It follows that an 
acceptable procedure must start from a good approximation of c/m . This is done by the spec¬ 
tral method. Observe that the key leitmotiv throughout this paper is a two steps estimation 
procedure that starts by the spectral estimator. This latter has rate of convergence of the 
order of _/\r- s /( 2s + 3 ) and seems to be a good candidate to initialize an iterative scheme that 
will converge towards c/m- It follows that the main consuming operations in our algorithm 
are the following steps. 

• The computation of the tensor Mtv 7 of the empirical law of three consecutive observa¬ 
tions where we use three loops of size M and one loop of size N so the complexity is 
0(NM 3 ), 

• The singular value decomposition of Pm hi the spectral method (complexity: 0(M 3 )), 

• The computation of the minimum of the empirical contrast function: cost of one 
evaluation of the empirical contrast function 0(K 3 AI 3 ) = 0(M 3 ) times the number 
/(M, K ) of evaluations while minimizing the empirical contrast. Recall that we start 
from the spectral estimator solution to get the minimum so a constant number of 
evaluation is enough in practice, say stopeval — le4 using CMAES. 


We have to compute the minimal contrast value for all models of size M = 1,..., M max 
where M max has to be chosen so that one can apply the slope heuristic. We deduce that the 
overall complexity of our algorithm is C>((/(M max , K)K 3 V N)A I^ a ^ ) where /(M max ,iF) is 
the number of evaluations of 77V while minimizing the empirical contrast. Since we use the 
spectral estimator as a starting point of the minimization of the empirical contrast, we believe 
that /(M max , K) can be considered as constant, say le4. Note that the upper bound Mm aY 


has to be large enough in order to observe a linear stabilization of M 1 —> c/m, see lBaudr v et al. 


( 20121 ) for instance. Moreover, recall that the trade-off between the approximation bias and 
the penalty term (accounting for the standard error of the empirical law) is obtained for 

D 

M ~ {N/ log N) 2s + D where s > 0 denotes the minimal smoothness parameter of the emission 
laws. In order to properly apply the slope heuristic, it is enough to consider models with 
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Figure 1: Variance comparison of the spectral and empirical least squares estimators. The 
upper curve (in red) present the performance (median value of the variance over 40 
iterations) of the spectral method while the lower curve (in blue) the performance 
of the empirical least squares estimator. For each curve, we have plotted a shaded 
box plot representing the first and third quartiles. 


this order of magnitude, so that M max = 0{(N/ log N) 2a + D ). It follows that the overall 
complexity of our procedure can be expressed in terms of the minimal smoothness parameter 
s of the emission laws as 

Complexity = O (V 1 2s + D ) , 

as soon as K = 0(N 1//3 ) which is a reasonable assumption. Nevertheless, this theoretical 
bound is unknown for the practitioner since it involves the unknown minimal smoothness 
parameter s > 0. For chains of length 0( le5), we have witnessed that one can afford a 
maximal model size M max < 50 and this allows to consider problems where typical sizes of 
M ranges between 1 and 50. All numerical experiments of this paper fall in this frame. 

6.3 Comparison of the variances 

The quadratic loss can be expressed as a variance term and a bias term as follows 

VI < k < K, VM > 0, \\ft - A||| = IIA - /aJ! + II ft - fh,k ll! 

where f^k ^ ie orthogonal projection of ff on ^m and fk is any estimator such that fk 

belongs to Note that the bias term ||/£ — f f r ||2 does not depend on the estimator fk■ 
Hence, the variance term 

Variance M (/) := min max ||A - /Ar(fe)ll! > 

r£Sj{ 1 <k<K ’ v ’ 

accounts for the performances of the estimator fk- 
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Figure 2: Slope heuristic to choose M: the experimenter may observe a linear stabilization 
of the empirical contrast qjv for estimating beta emission laws of parameters (2, 5) 
and (4,2). We have K = 2 hidden states and N = 5e4 samples along a single 
chain. On the left panel we have used the trigonometric basis as approximation 
space, the stabilization occurs on the points M = 30 to M = 50 and the interpo¬ 
lation of the slope leads to M = 23. On the right panel we have considered the 
trigonometric basis, the stabilization occurs on the points M = 20 to M = 50 and 
it leads to M = 21. 


As depicted in Figure Q] we have compared, for each M, the variance terms obtained 
by the spectral method and the empirical least squares method over 40 iterations on chains 
of length N = 5e4. We have considered K = 2 hidden states whose emission variables 
are distributed with respect to beta laws of parameters (2,5) and (4,2). This numerical 
experiment consolidates the idea that the least squares method significantly improves upon 
the spectral method. Indeed, even for small values of M, one may see in Figure Q] that the 
variance term is divided by a constant factor. 


6.4 Histogram basis and trigonometric basis as approximation spaces 

An illustrative example of our method can be given using the histogram basis (regular basis 
with M bins) or the trigonometric basis. In the following experiments, we have I\ = 2 
hidden states and emission laws given by beta laws of parameters (2,5) and (4,2). Recall 
we observe a single chain of length N = 5e4. 

We begin with the computation of the minimum contrast function M i->- 'y(gM), as 
depicted in Figure [2] Observe that the slope of this function unquestionably stabilizes at a 
critical value refer to as p/2 in both the histogram and the trigonometric case. This leads 
to an adaptive choice of M = 23 for the histogram basis and M = 21 for the trigonometric 
basis, see Figures |2] and [3] 

Furthermore, one can see on Figure 0] that our method also qualitatively improves upon 
the spectral method in both the histogram and the trigonometric case. 
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Figure 3: Slope heuristic to choose M : the experimenter observes the largest drop of the 
function p i—>• M(p ) at 1.1 so that p = 2.2 and M = 23. We have K = 2 hidden 
states and a single chain of length N = 5e4. We have used the histogram basis as 
approximation space. 


6.5 Three states 

Our method can be performed for K > 2 as illustrated in Figure [5] In this example K = 3, 
the sample size is N = 5e4 and the emission laws are three beta distributions with parameters 
(1.5,5), (6,6) and (7,2). Note that the number of hidden states K does not really impact 
on the complexity of the algorithm as we have seen in Section 16.21 

In this example, we were able to observe a linear stabilization of the minimum contrast 
function. The slope heuristic procedure led to an adaptive choice M = 25. 


7. Discussion 


We have proposed a penalized least squares method to estimate the emission densities of 
the hidden chain when the transition matrix of the hidden chain is full rank and the emis¬ 
sion probability distributions are linearly independent. The algorithm may be initialized 
using spectral estimators. The obtained estimators are adaptive rate optimal up to a log 
factor, where adaptivity is upon the family of emission densities. The results hold under an 
assumption on the parameter that holds generically. We have proved that this assumption 
is always verified when there are two hidden states. We did not find a general argument to 
prove that the assumption always holds when K > 2, and a natural question is to ask if, 
when the numb er of hidden states is K > 2. this assu mption is also always verified. 

It is proved in lAlexandrovich and Holzmannl (120141 1 that identifiability holds as soon as 
/*,..., are distinct densities. The identifiability is obtained in that case using the 
marginal distribution of dimension 2K+1, that is the marginal distribution of Yf,..., 

Thus, to get consistent estimators, one needs to use the joint distribution of 2 K + 1 con¬ 
secutive observations. Though linear independence is generically satisfied, one may wonder 
what hap pen s when emission densities are not far to be linearly dependent. Simulations in 
Lehericv ( 2015al ) show that estimation becomes harder. In those practical situations where 
estimation becomes difficult, it is observed that the Gram matrix of /*,..., has an eigen- 
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Emission law 1 


Emission law 2 
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Figure 4: Estimators of the emissions densities (beta laws of parameters (2,5) and (4,2)) 
from the observation of a single chain of length N = 5e4. On the top panels, 
we have used the histogram basis (M = 23). On the bottom panels, we have 
considered the trigonometric basis (M = 21). 


value close to 0. On the theoretical side, the proof of Theorem [6] uses the linear independence 
of the emission densities by using that Gram matrices are positive. An interesting problem 
would be to investigate if it is possible to estimate the emission densities with the classi¬ 
cal adaptive rate for density estimation when the emission densities are linearly dependent 
(though all distinct). It is possible using model selection to get the classical rate for the 
estimation of the density of 2 K + 1 consecutive observations, but it does not seem obvious 
to see whether this rate can be transferred to the estimators of the emission densities. This 
is the subject of further work, 


see 


Lehericyi (12015b ). 


Another question arising from our work is whether it is possible to adapt to different smooth¬ 
nesses of the emission densities. 
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Emission law 1 


Emission law 2 


Emission law 3 



Figure 5: Estimation of three densities given by beta laws of parameters (1.5,5), (6,6) and 
(7,2) from a single chain of length N = 5e4. We have used the histogram basis 
and we have found M = 25 using the slope heuristic. 


8. Proofs 


8.1 Proof of lemma [3] 


In Hsu et al. (2012) it is proved that when [HI], [H2], [H3] hold and when the rank of 
the matrix O m '■= {{'•Pm, fk)i<m<M,i<k<K is 77, the knowledge of the tensor M m given by 
Mm(o, b, c ) = E((p a (Yi)(pij(Y 2 )<Pc(Y 3 )) for all a, b, c in {1,... , M} allows to recover O m and 
Q up to relabelling of the hidden states. Thus, when [HI], [H2], [H3] and [H4] hold, the 
knowledge of g is equivalent to the knowledge of the sequence (Mm)m, which allows to 
recover Q and the sequence (O m)m, u P to relabelling of the hidden states, which allows 
to recover f* = ( ft . ... , f^) up to relabelling of the hidden states, thanks to (QJ. See also 


Gassiat et ah (12015 1 


8.2 Proof of Theorem [4| 

Throughout the proof N is fixed, and we write 7 (instead of qjv) for the contrast function. 
8.2.1 Beginning of the proof: algebraic manipulations 


Let us fix some M and some permutation r. Using the definitions of cjm and M, we can 
write 


7 (9m) + pen(IV, M) < 7 (g M ) + pen(lV, M) < 7 (g^’^ 1 ) + pen(AT, M ), 
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where f^ T _i = (/m ;T -i(i)> • • •, /m !T -i(*t)) ( here we use that € ^ K )- But we can 

compute for all functions ti,t 2 , 

7(*i) ~ l(t 2 ) = ll^i - 9* 11 2 11*2 - 9* - * 2 ), 

where i' is the centered empirical process 


N 

k*) = ^ 

v 5=1 


This gives 


-5*||i< lb Q,fM ’ T 1 -5*ll2+M<?Af - g^ M ' T 1 ) + pen(iV,M) — pen(7V,M) (4) 


Q,f* , , 

■ 1 M T—l 


m 


Now, we denote by Bm = ||<7^ ’* M — <7*111 a bias term and we notice that g 
gP r QPd> f M. Then 


Q,f* 


|| 5 Q ’ f M,.-i _ 5 *||2 < 2 || 5 Q ’W - 1 - 5 Q *’ f M||2 +2 || 5 Qb f M _ 5 *||2 

< 2 || 5 r 1P>T< ^ IP ’ r ’ f M — 5 Q*’ f M||2_|_2i3 M . 


But, using Schwarz inequality, Wg^^M — pQ 2 > f M||| can be bounded by 
M K 

y, Y {TTi(k 1 )Q, 1 (k 1 ,k 2 )Q,i(k2,k 3 ) - TT2(ki)Q 2 (ki,k2)Q2{k2,k 3 )) 

mi,m2,ni3=l ki,k 2 ,k3=l 

(/fell t Prni){fk 2 i ^Prn-i) {} fe) 

/ * 

<1 ^ (^i(^)Qi(^,fe)Qi(fe^)-7r 2 (ifci)Q 2 (ibi,ifc 2 )Q 2 (ifc 2> fc S )r 

\feljfc2»fe3 = l 
M K 

y' J {fkii l Pmi)(fk 2 ’ ( Prn 2 )(fk 3 ^ ( Pm 3 ) 

mi,m 2 ,m.3=l ki,k 2 ,k3=l 

< 3 K S Cj7,2 (ll 71 "! — 7r 2|li+ 2 IIQl — Q 2 1|T 1 ) 


(5) 


so that 


Hg^M.T— 1 _ „*l|2 ^ czv3rv6 


< 7 * III < 6 K a C'^ 2 (||P T 7 r- 7 r*||^+ 2 ||P T QP 7 -Q*|||) + 2 S m . 


Next we set Sm = Uq 5(Q,M) and 


Z M = sup 
tGS'M 


H*-#* 


II* ~9' 


, *|| 2_)_ a ,2 


M 
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for xm to be determined later. Then 

) = v{Qm - 9*) + "(9* - ) 

— Zm(\\9m ~ 9*\\2+ x ~m) + z M (\\g®’ m ’ t ~ 1 — g*\\ 2 2 +x 2 M ). 
Denoting by = \\g^ — g *||| the squared risk, (|4|) becomes 

Rm < 6A' 3 C^ 2 (||P T vr-7rl|+2||P T QP7-Qll)+2 J BM + 2^( J R A ; / + x^) 

+2 Z M (e K 3 c^ 2 (||P t tt - 7r*|||+2||P T QPj - Q*||+ 2 B M + x 2 M ) 

+2pen(./V, M ) — pen(N, M ) — pen(N, M ), 

R A (1 ~ 2 Zj^) < (2 + 4 Zm)Bm + 2pen(IV, M ) 

+(1 + 2Zm)6K 3 Cjt 2 (||Pr7T - 7r*|||+2||P T QPj - Q*||f/) 
+2 sup(2Zm'T^/ — pen(lV, M')). 


To conclude it is then sufficient to establish that, with probability larger than 1—(e—1) 1 e x , 
it holds 

1 X 

sup Z M i < - and sup(2 Z M 'X 2 M , — pen(lV, M')) < A— , 

m ' 4 M i Jy 

with A a constant depending only on Q* and f* and not on N, M, x. Thus we will have, for 
any M, with probability larger than 1 — (e — \)~ l e~ x , 


2 ^m — 3Bm + 2pen(A^, M ) + 2A— 

+9 C % t 2 (ll p rvf - 7r*||2+2||P r QPj - Q*||^) 


which is the announced result. 

The heart of the proof is then the study of Zm- We introduce um a projection of g* on 
Sm and we split Zm into two terms: Zm < 4 Zm,i + Zm, 2 with 


z m, i 

< 

Zm,2 


sup 

t&s M . 


\v(t - u M ) 1 
- UM P+4x 


I v{u M - g*)\ 


2 

M 


um - g*\\l+x 2 M 


Indeed um verihes: for all t £ Sm , 


\\um - g*h< ||i - g*h and \\u M ~ i|| 2 < 2 ||t - g*\\ 2 ■ 


8.2.2 Deviation inequality for Z m , 2 


Bernstein’s inequality (1241) for HMMs (see Appendix 0 gives, with probability larger than 
1 - e" z : 

\v{u M - g*)l < 2a/2c*||«m - <?*H!lb*l|oo ^7 + 2\/2c*||um - 5*l|oo^y • 
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Then, using a z + £r > 2ab, with probability larger than 1 — e 2 : 


I v(um ~ 5*)| 


< 2y/2c*\\g* 


1 


Z- + 2y/2c 


\\ u m\\ oo +115*11 oo Z 


h M 


N 


II UM - g*\\l+ x M v "™2x m V N 

But any function t in Sm can be written 

K 

t = ^2 ^(h)Q{h,k 2 )Q{k 2 , k 3 )f kl <g> fk 2 <8) fk 3 , 

ki,k 2 ,k 3 =l 

with f k G E for k = 1,..., K, so that sup te s M ||i||oo< Cjr 0O . Then, with probability larger 


than 1 — e ZM z 




( 6 ) 


8.2.3 Deviation inequality for Z m , i 


We shall first study the term sup tgS(T \v(t — um)\ where 

B a = {t € S M , P - u M h< o'}- 

Remark that, for all t € 5(Q,M), 


K 


K 


i< 


12 — ^2 7T2 ( k l)Q~(k 1 ,k 2 )Q 2 (k 2 ,k 3 ) "22 Ejr 2 Cjr 2 Cjr 2 < E 3 Cjt 2 - 

ki,k2,ks=l ki,k2,ks=l 

Then, if t E B a , p — um II 2 + & A 2K 3 / 2 Cjr 2 . Notice also that for all t € Sm, P — um||oo+ 
2Cjr . Now Proposition [13] in Appendix [A] (applied to a countable dense set in B a ) gives 
that for any measurable set A such that P(A) > 0, 


K a ( sup \u(t — um) I) < C* 
t&B a 




N 


N B VP (A) 


N 


P(A) 


and 


E = 


Vn [ y/H(u) A Ndu + (2C 3 T ^ + 2K 3/2 C^ 2 )H(a). 

Jo 

Here, for any integrable random variable Z. E A [Z] denotes E[Z\a\/E(A). 

We shall compute E later and find a m and ip such that 

Va > a M E < (1 + 2 C 3 jr )00 + 2K 3 / 2 C^ 2 M<t)VN- 

(see Section [8.2.41) . We then use Lemma 4.23 in Massart ( 20071 ) to write (for xm + <xm) 

<p{ 2 x m ) 


(7) 


E a sup 


W(t - U M )I 
\\t - u m \\ 2 +4:X 2 m \) ~ X 


<E 

X M 


C- 


Vn 


+ 2x M \l^log (j^) +^#^log ( 1 


N 6 VP(^4) 


N 


P(A) 


Finally, Lemma 2.4 in Massart (120071 ) ensures that, with probability 1 — e ZM 2 : 
\u(t - u M )\ 


Zm, 1 = sup 


0 1 11 11 9 9 

teS M l\\t ~ umM+^x z m 


< C* + 2l 2C 3 


x mVn 


x m n 


y T,oo 9 


x M N 


( 8 ) 
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8.2.4 Computation of the entropy and function ip 

The definition of H given in Proposition [TH] shows that H(5) is bounded by the classical 
bracketing entropy for L 2 distance at point S/Cjr ^ (where Cjr ^ bounds the sup norm of g *): 

H(5) < H(5/Cjr OQ , Sm, L 2 ). We denote by N(u,S, L 2 ) = the minimal number of 

brackets of radius u to cover S. Recall that when t\ and t 2 are real valued functions, the 
bracket [ii,^ 2 ] is the set of real valued functions t such that ti(-) < f(-) < t 2 (-), and the 
radius of the bracket is ||t 2 — ^ 1 11 2 • Now, observe that Sm = Uq5(Q, M ) is a set of mixtures 
of parametric functions. Denoting k = (/ci,/c 2 , ^ 3 ), Sm is included in 


Y ® fk 2 <s> fk 3 , y > °, Y /^( k ) = 

ke{l,...,A} 3 ke{l,...,A'} 3 

f kl e Pn Span(</?1,.. * = 1,2,3} . 

Set 

A = {/1 <8> /2 <8> h,fi € J 7 n Span(</?i,..., ip M ), * = 1,2,3}. 

Then following the proof in Appendix A of Bontemps and Toussile ( 2013 1). we can prove 

x 3 -i 


N(e,S M , L 2 )< 


2\ ^ I A f AT ( e A T 2 


£ / 


N(-,A,V 


K 3 


(9) 


where Ci depends on K and Cjr 2 . Denote B = JTlSpan^i,..., lpm)- Let a = (€ 
and b = (6 m )i< m <M € such that a m < b m , m = 1,..., M. For each m = 1,..., M 
and y £ y, let 


u m (y) = 


a m if tpm(y) > 0 


b m otherwise 
Vm{y) = a m + b m - u m (y). 

Then, if (c m )i<m,<M € is such that for all m = 1,..., M, a m < c m < b m , then 
M MM 

Kb(y) : = Y U ™(y)Vm(y) < Y C ™Vm{y) < Y V m{y)Vm{y) = Ul b {y). 


m =1 


m=l 


m= 1 


Moreover, 


M 


Wa,b — II Y1 l^ m I • I ^Pm 11|2 


m= 1 


< M\\b — a||| 

using Cauchy-Schwarz inequality. Thus, one may cover B with brackets of form [J7*., U^ b \. 
Also, for * = 1,2, 


M 


I^a,6ll2 — II /* ] \bm + ®m| -l^rral || 2 

rrn =1 

< 2M(||a||| + H&Hl)- 
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If now for some a 1 , b l in R M , /,- € i = 1,2,3, then 

fx®f 2 ®he [v,W] 

with 


V = min{f7*} fel ft2 C/*l &3 , ii,t 2 ,* 3 € { 1 , 2 }} 


and 


^ = max{C/^} fel U l a l b 3 , *i,* 2,*3 € { 1 , 2 }}, 
pointwise. Moreover, one can see that 


\W-V\ < 


U 2 n 1 )6 1 - Eft 


,&! 


max 

ji J2G{1,2} 


^a 2 ,b 2 


&3 


+ 


+ 


^ 2,62 - Eft,* 


U a 3 ,b 3 ~ U l 3 ,b 3 


max 

3 1J2G{1,2} 
max 

J'l ,12 6 ( 1 , 2 } 


n 


. 6 1 


/rr -L 


6 1 


^3,63 

u i*,b 2 


so that 




2=1 


&•? 


+ 


I/ 2 ' ■ 

aJ, bl 


\W-V\\1 < IS^Kv-^kLII 


2=1 


U 1 


+ 


r/ 2 . . 


< 48M 3 ^||6 i -o i ||in(lkil2 + l|(>’il2) 

2=1 j^i 


< 


192M 3 C> i2 ^||6' 


j _ „i ||2 
a II 2 - 


2=1 


Thus one may cover M by covering the ball of radius Cj -,2 in with hypercubes [a, 6 ], 
for which ||a|| 2 , || 6||2 are less than Cjp- T° get a bracket with radius u, it is enough that 
|| 6 * — a 1 1|| < rt 2 /(576M 3 Cj- 2 ), z = 1,2, 3. We finally obtain that 


N(u,A, L 2 ) < 


f 48\/3M 3 / 2 Cj- )2 


3M 


l 


U 


( 10 ) 


We deduce from @ and (fTUl) that 


lV(n,5 M ,L 2 ) < 


^48\/3M 3 / 2 C|- i2 


and then 


H(u, S M , L 2 ) < (K 3 - 1 ) log(^) + 3 MK 3 log (^ 

u \ u 


’ 
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with C *2 depending on K and Cj 2 . To conclude we use that j'J yJ\og(E)dx < cr(y / vr + 
^/log (^)), see Baudrv et al.; ( 2012 1. Finally we can write for a < M 3 / 2 : 


/ 


\JH{u)du < C 3 yfMa ( 1 + \ jlog ( ^ I , 


a 


where C 3 depends on I\, Cj- 2 and Cj? t00 . Set 


¥>(a;) = C 3 VMx fl + \Jlog ^ 


M 3 /2 




The function ip is increasing on ]0,M 3 / 2 ], and p[x)jx is decreasing. Moreover > 

Jq yjH(u)du and <p 2 (a) > a 2 H(a). 


8.2.5 End of the proof, choice of parameters 


As soon as N > C 2 /M 2 := No, we may define ctm as the solution of equation tp(x) = yfNx 2 . 
Then, for all a > gm , 

^ (CJ)2 - ^-aVN. 


H(a)< 


2 < 
a z a 


This yields, for all a > gmi 

E < (1 + 2C 3 t)0O + 2K 3 / 2 C^ 2 Ma)VN, 


which was required in (0. 
Moreover < I 

xmVN 

probability 1 — e~ ZM ~ z : 


Moreover < 2a m as soon as xm > &M- Combining 0 and ([6]), we obtain, with 


Z M < C ** 


I zM + Z ZM + 2: 


where C** depends on K, Cjr 2 , Cjr }00 , Q*. Now let us choose xm = 0 1 y g 2 m + ZM ^ Z with 

9 such that 2 9 + 9 2 < (C'**)~ 1 /4. This choice entails: xm > 0~ 1 gm and x 2 M > 9~ 2 "' M ] ^ Z . 
Then with probability 1 — e ~ ZM ~ z : 


< C **(0 + 0 + 0 2 ). 

We now choose 2 ^ = M which implies Sm> 1 e_ ^ M = (e — l) -1 . Then, with probability 
1 — (e — l) -1 e -2 , 

VM Z M < C**(29 + 9 2 )<^, 
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and for all M, 


Zm%m — C* 


VMXM + XM 


z M + Z Z M + Z 


N 


+ 


N 


< a M + 


/ , ZM + Z \ n *±ZM + z 


N 


+ C * 


N 


Then, with probability 1 — (e — 1) 1 e z , for all M, 

Zmx 2 m — C ** ^2 6 1 um + (26 1 + 1)— ^ < C**(26 1 + 1) — 
Then the result is proved as soon as 


pen(iV, M) > 2C** ( 2r 1 ^ + (20" 1 + 1)^ Y 


( 11 ) 


It remains to get an upper bound for &m • Recall that ctm is defined as the solution of 
equation CsVMx(l + ^log (^-'j ) = VNx 2 . Then we obtain that for some C\ 


ctm < C A \j^-( 1 + \f\og(N)) 


and (1111) holds as soon as 


pen(iV, M) > p 


M log (IV) 


N 


for some constant p* depending on Cjp and Cjr }OQ (Scenario A) or on Q*, Cjp and ^ 

(Scenario B). 

8.3 Proof of Theorem [6] 

For any h £ /C A and Q £ V, denote N( Q, h) = ||gQ> f *+ h — \\ 2 - What we want to prove 

is that 

~, 2 ._ JV(Q,h) 


c :=c(K,V,5*r ~ 


inf ——— > 0. 

Qev,he/c^,||h|| 2 ^o ho 


One can compute: 


K 


N( Q,h)= 7r(A: 1 )Q(A: 1 ,fc 2 )Q(fe 2 ,fc3)7r(fc' 1 )Q(fc' 1 ,l-')Q(4,fe') 

( n (ft* + hk i'fk[ + ^k') +n (fhi’fk') ~ + ^^1') - nifki’fk^ + ^ 


\i =1 


2=1 


2=1 


2=1 


Let u = (u\, ..., uk) be such that Ui, i = 1 ,,K, is the orthogonal projection of hj on the 
subspace of ~L?(y,C D ) spanned by /* ..., Then 


N( Q, h) = N( Q, u) + M(Q, u, h - u) 


( 12 ) 
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where, for any a = (a\,..., clk) € L 2 (3A C D ) K , 


I\ 

M{ Q,u,a)= ^2 K(ki)Q(ki,k 2 )Q(k 2 ,k 3 )Tr(k[)Q(k' 1 ,k' 2 )Q(k' 2 ,k , 3 ) 

ki,k2,k3,k' 1 ,k / 2,k'^=l 

3 3 3 

+ '^2( ak i, a k , j ) + u fc')+y~i(/fc+ u fci)/fc'+' u fci)ii( o ^> a fc') 

Z—1 Z=1 ^‘^Z Z=1 J^Z 

Let .A = 2)iag[7r] with 7r the stationary distribution of Q. Then M( Q, u, a) may be rewritten 
as: 

M{ Q,u,a) =Eij =1 ((Q r ^ a )i, (Q T ^ a )j)(aoai)((Q a )i, (Q a )i) 

+((Q r ^4 a )j, (Q r Aa)j)((f* + Uj), (f* + u)j)((Q(f* + u))*, (Q(f* + u))j) 
+((Q r J 4(f* + u))i, (Q T A(f* + u))j)(ai,aj)((Q({* + u))j, (Q(f* + u))j) 
+((Q r ^4(f* + u))j, (Q T A(f* + u))j)((f* + u)j, (f* + u)j)((Qa)j, (Q a )j) 
+((Q r A(f* + u))i, (Q T A(f* + u))j)(aj, Oj)((Qa)j, (Qa)j) 

+((Q r Aa);, (Q r Aa)j)((f* + u );, (f* + u)j)((Qa)i, (Qa)j) 

+((Q r Aa)j, (Q r ^a) j )(a i ,a j )((Q(f* + u))*, (Q(f* + u))j). 

All terms in this sum are non negative. Let us prove it for the first one, the argument for 
the others is similar. Define V the K x K matrix given by 

Vij = ((Q T Aa) i ,(Q 2 Aa) i )((Qa) i ,(Qa) i ), i,j = 1 



V is the Hadamard product of two Gram matrices which are non negative, thus V is itself 
non negative by the Schur product Theorem, see ISch ur (11911! ), and 


K 

'Yh V lnj {a,. a.j) = / &(y) T Va(y)dy > 0. 

i,j =1 J 


Thus we have that M( Q, u, a) is lower bounded by one term of the sum so that 


K 

M(Q, u, a) > ]T ((Q r A(f* + u))i, (Q T A(f* + u)),•)(«*, a,-)((Q(f* + u)),, (Q(f* + u)),). 

i,J=l 

The minimal eigenvalue of the Hadamard product of two non negative matrices is lower 
bounded by the product of the minimal eignevalues of each matrix, and we get 


M(Q, u, a) > ( min Aj(Q 7 A({* + u)) ) ( min A,(Q(f* + u)) ) ||a||| (13) 

\i=l,...,K J \i=l,...,K J 

where ||a||| = Ylk =l Il a fc|l 2 > an d where, if h € L 2 (T, C D ) K , Ai(h), ..., A^(h) are the (non 
negative) eigenvalues of the Gram matrix of hi ,..., Hk- 
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Let now (Qn,h n )n be a sequence in V X K, such that c = lim n . Let u n be the 

vector of the orthogonal projections of the coordinate functions of h„ on the subspace of 
L 2 (T, C d ) spanned by ..., Notice that 


|h„|| q* — IKHq* T ||h n u 


2 

n|| 2 - 


Let C/c ,2 be the upper bound of the norm of elements of 1C. We have, for any n > 1, 


|hn||Q* < K(C]C,2 + 2Cj-,2 


so that for any n > 1, 

||u„||q* < K{Ck,2 + 2 Cj )2 ) 2 . 

Since (Q n ,u n ) n is a bounded sequence in a finite dimensional space it has a limit point 
(Q, u ). Now, using (1121) and the non negativity of M (Q n , u n . h n — u n ), we get on the 
converging subsequence 

> JV(Qn, u n) _ U ) 

— n—>+oo K(C)C,2 + ^Cjr 2 ) 2 K(CfC t 2 + ^Cjr 2 ) 2 

Since Q £ V, Tq C Tq* so that ||u||q> ||u||q*. Thus if ||u||q*/ 0, ||u||q/ 0, and using 
Lemma m N( Q,u) 0 so that c > 0 in this case. 


Consider now the situation where ||u||q* = 0. Since lim n ^. + 0 O ||u n ||Q*= 0, there exists rq 
and r € Tq* such that for all n > ni, one has ||u n ||Q* = X/feLi \\ u n.k + /£ — \\ 2 i and it 

is possible to exchange the states in the transition matrix using r so that we just have to 
consider the situation where ||u n Hq*= H u n || 2 for large enough n. 

Eigenvalues of Gram matrices of functions are continuous in the functions so that using (1121) 
and (1131) we get 


c > lim 


N(Q n ,u r 


n—>+oo ||u n ||2 + ||h n - U 


Till 2 


+ 


min Ai(Q T Af* 


min Aj(Qf*) ) lim inf 




llh — u II 2 

11 lL n u n 112 


n—>-+oo u 


■n ||2 


+l|h n - u n 11 2 ' 


Under assumptions [HI] and [H4], Q T Ai* is a vector of linearly independent functions and 
Qf* also, so that 

( min Ai(Q 7 *4f*)) ( min Aj(Qf*) ) >0. 

\i=l,...,K J \i=l,...,K J 

Thus, if lim inf n _s. +00 v,— jjk'L u 'd" 2 > 0 we obtain c > 0. 

II II2 ' II 112 

||Jj _||2 

If now liminf n ^. +00 t,— h 2 ,,■> = 0, we have on a subsequence 

11112 "I - 11 Un 112 


c > lim 


N (Qn? U-n 


U 


■n|| 2 


= lim 


N (Qn? Un 


n—»+oo 1 1U n 11 2 ||u n ||2 + ||h n - U n ||| «^+oo 


U 


(14) 


L n ||2 
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with (u n )„, a sequence of vectors of functions in the finite dimensional space spanned by 
/*,..., and such that for i = 1,..., K, 


u m = o 

n—»+oo J ||u n || 2 


(15) 


since for all n and alii = 1,. .., K, f u j dy = 0. 

Let us return to general considerations on the function iV(-, •)• As it may be seen from 
its formula, lV(Q,h) is polynomial in the variables Qij, (f*,fj), {hi, f *), (hi,hj ), i,j = 
1, ... ,K. Let D{ Q, h) denote the part of N( Q, li) which is homogeneous of degree 2 with 
respect to the variable h, that is 


K 


D( Q, h) = £ A: 2 )Q(A: 2 , k 3 )7r(k[)Q(k[, fe')Q(fe', k' 3 ) 

fei ,/c2,Al3,/c^,fc25^3 —1 

3 3 

n (fkMihk^fc,) i( 16 ) 


i=l 




i=l 




One gets 


N(Q,h) = D(Q,h) + 0 ||h 


where the O(-) depends only on f*. Let us hrst notice that D (•, •) is always non negative. 
Indeed, since for all Q £ V and all h £ (L 2 (3L C D )) K one has IV(Q,h) > 0, it holds 


vq g v, vhe ( L 2 (y,c D )) K , 


D( Q,h) 


+ O(||h|| 2 )>0, 


so that, since for all A £ M, D( Q, Ah) = A 2 H(Q, h), 

VQ £ V, Vh £ (L 2 (T, C d )) k , D( Q, h) > 0. (17) 

Then we obtain from (1141) 

c > liminf D( Q n , -—^—). 

n—»+cx> v 11 U n 11 2 

Let b = (&i,..., bx) be a limit point of the sequence ( )n- We then have 

c> D (Q, b). 


Now, using (1151) we get that 


J b k dC D = 0, k = 1,..., K. 
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Thus there exists a K X K matrix U such that b T = U( f*) T and Ul^ = 0, and equation 
m leads to 


D(Q,b) = 


^ { (Q T AUG'U T AQ) i . (G*) u (QG*Q T ) y + (Q^GMQ), . (UG'U T ). . (QG*Cf). . 

' i 

(Q t AG*AQ). j (G% (QI/G*t/ T Q r ). J+2 {(Q r A[/GMQ) y (GG*)^ (QG*Q T ). . 

+ (Q T ihGMQ). . (Qt/G*Q r ) (G*)^- + (17G%. (QGG*Q T ) (Q t HG*HQ) . J . 


+ 


with G* the Ji x /i Gram matrix such that (G*)jj = (/*. /*). i = 1,..., if. 

This is the quadratic form T> in U t j. i = 1,..., K, j = 1,..., K — 1 defined in Section 14.21 
This quadratic form is non negative, and as soon as it is positive, we get that c > 0. But 
the quadratic form T> is positive as soon as its determinant is non zero, that is if and only 

if H(Q, G(f*)) / 0. 


8.4 Proof of Lemma [5] 

Here we specialize to the situation where K = 2. In such a case, f* = (/*, f£), and 


Q* 


l — p* p* \ 

q * l~q* J 


for some p*, q* in [0,1] for which 0 < p* < 1, 0 < q* < 1, p* / 1 — q*. Now 


U = 


a —a \ 

P ~P J 


for some real numbers a and (3, and brute force computation gives Z?(Q,b) = D\pa 2 + 
2Di y 2af3 + -D 2j 2 /3 2 with, denoting p = Q(l, 2) and q = Q(2,1): 


(P + q) 2 D 1A 


=2(i — P ) 2 \\fi - / 2 i 2 ii/ii 2 ii(i - p)k +p/ 2 i 2 +ii(i - p)ft+pm 4 \\fi - m 2 

+ 4 p(i — p) (((1 — p)f* + pft,qft + (1 — q)ft)) (ftJt)Wft - ft \\ 2 
+ 2p 2 \\K - / 2 *|| 2 ||/ 2 l 2 lk/ 1 * + (1 - q)ft \\ 2 
+ 2(1 - pf (((1 - P )ft+pfi n - r 2 )f mw 2 

+ 2p 2 ((qf* + (1 - q)fl ft - ft)) 2 \\m 2 

+ 4p(l - p){qft + (1 - q)fi, ft - ft)(( 1 - p)ft + Pft, ft - ft)(ft, ft) 

+4(1.—p)<(i— P )ft+pfti ft - ft)(ft, ft - ftm-p)ft+pft \\ 2 
+ 4 p(( 1 — p)ft + pft, ft — ft)(ft, ft ~ ft)( 0 - — p)ft + pft, qft + (1 ~ q)ft), 


30 



Estimation of nonparametric HMMs 


(p + q) 2 D- 2 ,2 


=^ 2 \\fl - ft H 2 H/lTll(l - P)fl+vm 2 + htt + (1 - q)ti lll/i* - / 2 I 2 

+ 4(1 - q)q(((l-p)ff+pft,qff + (1 - q)f 2 )) (ff,ft)\\ff “ / 2 T 
+ 2(1 - q) 2 \\fi - / 2 l 2 ||/ 2 l 2 ||g/r + (1 - g)/ 2 *|| 2 

+ 2 q 2 (((1 - p)ft+ P ft, ff ~ ft)) 2 II m 2 

+ 2(1 - qf ((qft + (1 - q)fl ff - ff)) 2 ||/ 2 || 2 

+ 4g(l - q)(qff + (1 - q)ff, ft ~ ft)(( 1 - p)ft + Pft, ft ~ ft) (ft, ft) 

+ 4 q(qft + (1 - q)ff , ft ~ ft) (ft, ft ~ / 2 *)((1 - p)ff+pff, qft + (1 “ ?)/ 2 *) 
+ 4(1 - q)(qff + (1 - q)f*, ff - f*)(f*, ft - ff)\\qft + (1 - q)ft f , 


and: 


(JP + q) 2 Dh2 

pq 


=2(1 — p)q\\ft — / 2 ll 2 ll/i ll 2 ll(i — p)ft + p/ 2 II 2 

+ 2{pq + (1 -p)(l - g)] (((1 -p)ft +Pft,qft + (1 " <?)/ 2 *» </i*, ft) lift ~ ftt 
+ (((1 -P)ft+Pft,qft + (1 - q)ft )) 2 lift ~ ft \\ 2 
+ 2p(l - 5)11/? - / 2 *|| 2 ||/ 2 *|| 2 |k/i + (l - g)/!H 2 
+ 2g(l - p) (((1 - p)ft + Pft, ft - ft )) 2 ll/ll 2 

+ 2p(l - g) ((9/i* + (1 - q)ft, ft - ft)? WftW 2 

+ 2 pq(qff + (1 - q)ft, ft ~ ft)( 0 - ~ P)ft + Pft, /l ~ ft)(ft, ft) 

+ 2(1 - p)(l - q)(qft + (1 - q)ft, A ~ ft)(( 1 ~ P)ft + Pft, ft ~ ft)(ft, ft) 

+ q(( 1 - P)ft+Pft, ft - ft)(ft, ft - / 2 )ll(l - P)ft +Pft \\ 2 

+ 2(1 - P )(qft + (1 - q)ft, ft ~ ft)(ft , ft - ft)(^-p)ft+pft,qft + (1 - q)ft) 

+ 2(1 - g)((l - p)ft + pft, ff ~ ft)(ft, ft - / 2 }(( 1 - P)ff + Pft, qft + (1 - q)ft) 
+ 2 P (qft + (i - q)ff, ff - ft)(ff, ft - ft)hff + (i - q)ft\\ 2 - 


We have: 

H(Q,G(f*)) = D 1A D 2 ,2-Dl 2 
We shall now write H( Q, G(f*)) using 


n 1 = II/ 1 II 2 , n 2 = \\J 2 W 2 , a = 


(ft, ft) 
tb\\ft\W 


for which the range is [l,oo[ 2 x[0,1[. Doing so, we obtain a polynomial P\ in the variables 


rii, n 2 , a, p and q. 

First observe that, by symmetry, 


Pi (ni,n 2 ,a,p,q) = Pi (n 2 ,ni,a,q,p ), 

so that it is sufficient to prove that the polynomial Pi is positive on the domain 


1 < n 2 < ni , 


(18) 
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and 0 < a < 1 and 0 < p ^ q < 1 . 
Furthermore, consider the change of variable 


q = 1 — p + d 

then we have a polynomial Pj in the variables n\, 712, a, p and d which factorizes with 

p 2 ( 1 — a 2 )d 2 n\n 2 (l + d — p ) 2 

OTW ' 

Dividing by this factor, one gets a polynomial P3 which is homogeneous of degree 8 in n \ 
and n2, so that one may set m = 1 and keep b = 712 £] 0 , 1 ] (observe that we have used 
(1181) to reduce the problem to the domain 712/711 < 1 ) and obtain a polynomial P4 in the 
variables 6 , a, p and d. It remains to prove that P4 is positive on P 4 = {b £]0, l],a £ 
[0,1 [,p £]0,1 [,d £}p - 1,0[U]0,p[}. 

Consider now the following change of variables 

1 y 2 z 2 ( tz) 2 - 1 

b =- 3 -, a =-- , p = -- , and d = ---— , 

1 + x 2 l + y 2 l + z 2 (l + t 2 )(l + z 2 ) 

mapping ( x , y, z, t ) £ M 4 onto (b , a,p , d) £ V 5 = {b £] 0 , 1 ], a £ [ 0 , l[,p £ [ 0,1 [,d £]p — 1 ,p[} 
which contains P 4 . This change of variables maps P 4 onto a rational fraction with positive 
denominator, namely 

(1 + f 2 ) 4 (l + y 2 ) A { 1 + z 2 ) 4 (l + x 2 ) 8 

So it remains to prove that its numerator F 5 , which is polynomial, is positive on M 4 . An 
expression of P 5 can be found in Appendix [B] Observe that P 5 is polynomial in x 2 , y 2 , z 2 
and t 2 and there are only three monomials with negative coefficients. These monomials can 
be expressed as sum of squares using others monomials, namely: 

• —18x 12 t 2 + 27x 12 + 1979x 12 t 4 = 18x 12 + 9(x 6 - x 6 t 2 ) 2 + 1970x 12 t 4 , 

• —108x 10 t 2 + 1970x 12 f 4 + 495x 8 = 439x 8 + 56(x 4 - x 6 t 2 ) 2 + 1914x 12 f 4 + 4t 2 x 10 , 

• and —114x s t 2 + 972x 4 + 1914x 12 t 4 = 915x 4 + 57(x 2 — x 6 f 2 ) 2 + 1857x 12 t 4 . 

Thus P 5 is equal to 144 more a sum of squares, hence it is positive. This proves that 
i7(Q,G'(f*)) is always positive. 


8.5 Proof of Theorem [7] 

Let /C = {h = f — f*, f £ P K }. Using Theorem |4] we get that for all x > 0, for all N > Nq, 
with probability 1 — (e — \)~ l e~ x , one has for any permutation r^v, 

\\g-9*\\l < 6inf |||/-5 Q *’ f "lli + pen(AT,M)} + A\^ (19) 

+18Cjt,2 (lIQ* — Pr„4vPj, II l + Ik* - Pr^^lli) • 
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Notice that writing 


K 


Hvu V2, 2 / 3 ) = £ ( ¥ T N n)(k 1 )(F TN QFjj(k 1 ,k 2 ){F TN QFl N )(k2,k 3 ) 

ki,k2,k3=l 

X frN(ki) (yi)/rjv(fc2) (Vs) fr N (ks) (?/3) , 

and applying Theorem [6] we get that, on the event P rjv QIPJ^, € V, there exists r such 

that 


jT 


K 


£ II fr(t) ~ lr 


,T " < ‘)" 2 5 c(K,V,5*) 2 

Now by the triangular inequality 

||3_/nvQ p 4. f *|| 2 < ||0-/|| 2 + ||/3*,f* _/-ivQ p 4’ r || 2 . 
Similarly to (j5]), we have 


- /^Q P r JV . f *l |2 


2 - 


fc=l 


( 20 ) 


( 21 ) 


II 9 


Q*,f* 


P’rjv Q P rjv )f* I 


< 3/i: 3 c|- 


K* - ^71 


+ 21| Q* — F rjv QP. 


]> T I 

' t n I 


In the same way, 


( 22 ) 


(/ - 3 Q *’ f ") ( 2 / 1 , 2 / 2 , 2 / 3 ) = 

K 

£ ^(feOQ^fer,fc 2 )Q*(^,fc 3 ) UtM)fUy-2)ftM) - fM,kM)fMte(y*)fM,kM) 

ki,k2,k3=l 


so that 

11/ - Z 3 *’^ III < 3it 3 C£ max{||/ fc * - /j/tlll, fc = 1,. .., K}. 

Thus collecting (1191) . (1201) . (1211) . (1221) and with an appropriate choice of A* we get Theorem [7] 

8.6 Proof of Corollary 1101 

We shall apply Theorem 1111 where, for each N, we define hy such that (—log<5y)/<5y : = 
(log N ) 3 / 2 . Notice first that hy goes to 0 and that My tends to infinity as N tends to infinity, 
so that for large enough N, My > M$*. By denoting ry the tm n given by Theorem 1111 we 
get that for all x > x(Q*), for all N > N(Q*, $*)x log IV, with probability 1 — [4 + (e — 
l)- 1 ]^* - 2<5y, 

ik* - p rjV 7r|| 2 < c(Q*,r)J^-v^ 

and 

\\Q*-F TN QFjj<C(Q\F) ) J^fv^. 
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We first obtain that 

N 


lim sup E 

iV—>■+oo 


log 


q*-p tjv qp tjvI 


< 


c(Q\r ) 2 


r*+oo 


/ lim sup P 

0 TV—»+cso 


Vn 


IQ* _ IP QP^l,. II > y/x) dx < 


C(Q*,^*)y/\ogN 

r+oo 

C(Q*, r) 2 x(Q*) + C(Q*, r ) 2 / [4 + (e - 1 y^e^dx < +oo 

J x(Q*) 


so that 


E 


|Q*-P rjv QP TJV , 


= O 


logiV 

N 


Similarly, one has E [|| 7 r* — P Tjv 7 t|| 2 ] = O ■ We also obtain, by taking x = N/ (log N ) 1 / 4 , 

that 

lim sup P (p TjV QP^. ^ v) = 0, 

TV—>-+oo ^ ' 

so that, using Theorem [7] we get for some t G Tq*, 


lim sup P 

TV^+oo 


N 
A * 


K 


K 


II fk - /r-iorjvwlll - inf \ Y1 ll^fc “ /Af,fclli + pen (N,M) 


Lfc=i 


.fc=l 


-IIQ* - - w TN n 2 2 +^\ >x)<(e- irV* 

Thus, by integration and the previous results, Corollary HOl follows. 
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Appendix A. Concentration inequalities 


We first recall results that hold both for (Scenario A) (where we consider N i.i.d. samples 
(Yj S \ Y^ s \ of three consecutive observations) and for (Scenario B) (where we 

consider consecutive observations of the same chain). 

The following pr opo sition is the classical Bernstein’s inequality for (Scenario A) and 
is proved in Paulin (2013), Theorem 2.4, for (Scenario B). 


Proposition 12 Let t be a real valued and measurable bounded function on T 3 - Let V = 
E[t 2 (Zi)]. There exists a positive constant c* depending only on Q* such that for all 0 < 
A < l/(2\/2c*||fHoc) : 


log E exp 


N 


X^2(t(Z 8 )-Et(Zs)) 


s =1 


< 


2Nc*V\ 2 


1 — 2 y/ 2 < 


C*\\t oo- 


(23) 


so that for all x > 0, 


N 


P 


Y ( t{Z s ) - Et(Z fl )) > 2V2Nc*Vx + 2V2c* 


IOO^ I _ 


< e 


(24) 


kS =1 


We now state a deviation inequality, which comes from Massart ( 20071 ) Theorem 6.8 and 
Corollary 6.9 for (Scenario A). For (Scenario B) the proof of the following pr opositio n 
follows mutatis mutandis from the proof of Theorem 6.8 (and then Corollary 6.9) in lMassartl 
(120071 ) the early first step being equation (1231) . Recall that when t\ and £2 are real valued 
functions, the bracket [ti, £ 2 ] is the set of real valued functions t such that fi(-) < t(-) < t2( - )- 
For any measurable set A such that P(A) > 0, and any integrable random variable Z. denote 
E a [Z] = E[Z1 a ]/¥(A). 


Proposition 13 Let T be some countable class of real valued and measurable functions on 
T 3 - Assume that there exists some positive numbers a and b such that for allt G T, ||i||oo< b 
and E[f 2 (Zi)] < cr 2 . 

Assume furthermore that for any positive number 5, there exists some finite set B§ of brackets 
covering T such that for any bracket [£ 1 ,^ 2 ] £ Bs, ||ti — t 2 ||oo< b and E[(ti — ^ 2 ) 2 (^ 1 )] < ■ 

Let e H ^ denote the minimal cardinality of such a covering. Then, there exists a positive 
constant C* depending only on Q* such that: for any measurable set A, 


E a 


LpV(t(Z s ) - Et(Zs))'] 


< c* 


E + a 




+ 6 log 



and for all positive number x 


( N 

p supVwz.) 

V ier ^ 



< exp(— x), 


where 


E = 


Vn f \JH{u ) A Ndu + (b + a)H(cr) 

Jo 
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Appendix B. Expression of polynomial P 5 

Computer assisted computations (available at https : //mycore. core-cloud.net/public ,php?service=f ilesl 

give that: 

P 5 = 

144 - 114 t~2 x“8 - 108 t~2 x~10 - 18 t~2 x~12 + 

192 t~2 + 128 t~4 + 256 t~6 + 176 t~8 + 576 x~2 + 624 t~2 x~2 + 

672 t~4 x~2 + 1776 t~6 x~2 + 1152 t~8 x~2 + 972 x~4 + 720 t~2 x~4 + 

1884 t~4 x~4 + 5496 t~6 x~4 + 3360 t~8 x~4 + 900 x~6 + 264 t~2 x~6 + 

3556 t~4 x~6 + 9920 t~6 x~6 + 5728 t~8 x~6 + 495 x~8 + 

4551 t~4 x“8 + 11424 t~6 x~8 + 6264 t~8 x~8 + 162 x“10 + 

3810 t~4 x“10 + 8592 t~6 x~10 + 4512 t~8 x“10 + 

27 x~12 + 1979 t~4 x~12 + 4120 t~6 x~12 + 

2096 t~8 x~12 + 576 t~4 x~14 + 1152 t~6 x~14 + 576 t~8 x“14 + 

72 t~4 x~16 + 144 t~6 x“16 + 72 t~8 x“16 + 144 y“2 + 480 t~2 y“2 + 

784 t~4 y~2 + 704 t~6 y~2 + 256 t~8 y~2 + 576 x~2 y~2 + 

2064 t~2 x~2 y~2 + 4192 t~4 x~2 y~2 + 4496 t~6 x~2 y“2 + 

1792 t~8 x~2 y~2 + 1080 x~4 y~2 + 4104 t~2 x~4 y~2 + 

10760 t~4 x~4 y“2 + 13528 t~6 x~4 y~2 + 5792 t~8 x~4 y~2 + 

1224 x~6 y“2 + 5016 t~2 x~6 y~2 + 17592 t~4 x~6 y~2 + 

25032 t~6 x~6 y“2 + 11232 t~8 x~6 y~2 + 900 x“8 y~2 + 

4224 t~2 x“8 y~2 + 19924 t~4 x~8 y~2 + 30776 t~6 x~8 y~2 + 

14176 t~8 x~8 y~2 + 432 x~10 y~2 + 2520 t~2 x~10 y~2 + 

15584 t~4 x~10 y~2 + 25336 t“6 x“10 y~2 + 11840 t~8 x“10 y~2 + 

108 x~12 y“2 + 936 t~2 x~12 y~2 + 7916 t~4 x~12 y~2 + 

13456 t~6 x~12 y~2 + 6368 t~8 x~12 y~2 + 144 t~2 x~14 y~2 + 

2304 t~4 x“14 y“2 + 4176 t~6 x~14 y~2 + 2016 t~8 x~14 y~2 + 

288 t~4 x~16 y~2 + 576 t~6 x~16 y~2 + 288 t~8 x~16 y~2 + 144 y~4 + 

480 t“2 y~4 + 624 t~4 y~4 + 384 t“6 y"4 + 96 t~8 y~4 + 576 x~2 y~4 + 

2208 t~2 x~2 y~4 + 3392 t~4 x~2 y~4 + 2464 t~6 x~2 y~4 + 

704 t“8 x~2 y“4 + 1188 x~4 y~4 + 5256 t~2 x~4 y~4 + 

9636 t~4 x~4 y~4 + 8256 t~6 x~4 y~4 + 2688 t~8 x~4 y~4 + 

1548 x~6 y~4 + 8112 t~2 x~6 y~4 + 18076 t~4 x~6 y~4 + 

18008 t~6 x~6 y"4 + 6496 t~8 x~6 y~4 + 1359 x~8 y~4 + 

8598 t~2 x~8 y~4 + 23375 t~4 x~8 y~4 + 26392 t~6 x~8 y~4 + 

10256 t~8 x~8 y"4 + 810 x~10 y~4 + 6156 t~2 x~10 y~4 + 

20442 t~4 x~10 y~4 + 25656 t“6 x“10 y“4 + 10560 t~8 x'10 y~4 + 

243 x~12 y“4 + 2574 t~2 x~12 y~4 + 11299 t~4 x~12 y“4 + 

15848 t~6 x~12 y~4 + 6880 t~8 x~12 y~4 + 432 t~2 x~14 y~4 + 

3456 t~4 x~14 y"4 + 5616 t~6 x~14 y~4 + 2592 t~8 x~14 y~4 + 

432 t~4 x~16 y~4 + 864 t'6 x“16 y~4 + 432 t~8 x~16 y"4 + 

216 x"4 y~6 + 720 t~2 x~4 y~6 + 952 t~4 x~4 y~6 + 608 t~6 x~4 y~6 + 

160 t“8 x"4 y~6 + 648 x~6 y~6 + 2592 t~2 x~6 y~6 + 

4168 t~4 x~6 y~6 + 3152 t~6 x~6 y~6 + 928 t~8 x"6 y~6 + 

918 x~8 y~6 + 4428 t~2 x~8 y~6 + 8502 t~4 x~8 y~6 + 

7392 t~6 x~8 y~6 + 2400 t~8 x~8 y~6 + 756 x~10 y~6 + 

4392 t~2 x“10 y“6 + 10036 t~4 x~10 y~6 + 9920 t“6 x“10 y~6 + 

3520 t~8 x~10 y~6 + 270 x~12 y~6 + 2268 t~2 x~12 y~6 + 

6766 t~4 x~12 y“6 + 7808 t~6 x~12 y~6 + 3040 t~8 x~12 y~6 + 

432 t~2 x“14 y~6 + 2304 t~4 x“14 y“6 + 3312 t~6 x~14 y~6 + 
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1440 t~8 x~14 y~6 + 288 t~4 x~16 y~6 + 576 t~6 x~16 y~6 + 

288 t~8 x~16 y~6 + 108 x~8 y~8 + 360 t~2 x~8 y~8 + 468 t~4 x~8 y~8 + 
288 t~6 x~8 y~8 + 72 t~8 x~8 y~8 + 216 x~10 y~8 + 864 t~2 x~10 y~8 + 

1368 t~4 x~10 y~8 + 1008 t~6 x~10 y~8 + 288 t~8 x~10 y~8 + 

108 x~12 y~8 + 648 t~2 x~12 y~8 + 1404 t~4 x~12 y~8 + 

1296 t~6 x~12 y~8 + 432 t~8 x~12 y~8 + 144 t~2 x“14 y~8 + 

576 t~4 x~14 y~8 + 720 t~6 x~14 y~8 + 288 t~8 x~14 y~8 + 

72 t~4 x~16 y“8 + 144 t~6 x~16 y~8 + 72 t~8 x~16 y~8 + 192 z~2 + 

416 t~2 z~2 + 288 t~4 z~2 + 320 t~6 z~2 + 256 t~8 z~2 + 

912 x~2 z~2 + 1664 t~2 x~2 z~2 + 1248 t~4 x~2 z~2 + 

2304 t~6 x~2 z~2 + 1808 t~8 x~2 z~2 + 1728 x~4 z~2 + 

2520 t~2 x~4 z~2 + 2776 t~4 x~4 z~2 + 7624 t~6 x~4 z~2 + 

5640 t~8 x~4 z~2 + 1704 x~6 z~2 + 1736 t~2 x~6 z~2 + 

4664 t~4 x~6 z~2 + 14808 t~6 x~6 z~2 + 10176 t~8 x~6 z~2 + 

966 x~8 z~2 + 494 t~2 x~8 z~2 + 6098 t~4 x~8 z~2 + 

18218 t~6 x~8 z~2 + 11648 t~8 x~8 z~2 + 324 x~10 z~2 + 

36 t~2 x~10 z~2 + 5468 t~4 x~10 z~2 + 14444 t~6 x~10 z~2 + 

8688 t~8 x~10 z~2 + 54 x~12 z~2 + 6 t~2 x~12 z~2 + 

3002 t~4 x~12 z~2 + 7186 t~6 x~12 z~2 + 4136 t~8 x~12 z~2 + 

896 t~4 x~14 z~2 + 2048 t~6 x~14 z~2 + 1152 t~8 x~14 z~2 + 

112 t~4 x~16 z~2 + 256 t~6 x~16 z~2 + 144 t~8 x~16 z~2 + 

480 y~2 z~2 + 1312 t~2 y~2 z~2 + 1888 t~4 y~2 z~2 + 

1760 t~6 y~2 z~2 + 704 t~8 y~2 z~2 + 1776 x~2 y~2 z~2 + 

5248 t~2 x~2 y~2 z~2 + 9504 t~4 x~2 y~2 z~2 + 

10624 t~6 x~2 y~2 z~2 + 4592 t~8 x~2 y~2 z~2 + 3096 x"4 y~2 z~2 + 
9904 t~2 x~4 y~2 z~2 + 23104 t~4 x~4 y~2 z~2 + 

30288 t~6 x~4 y~2 z~2 + 13992 t~8 x~4 y~2 z~2 + 3144 x~6 y~2 z~2 + 

11344 t~2 x~6 y~2 z~2 + 35712 t~4 x~6 y~2 z~2 + 

53424 t~6 x~6 y~2 z~2 + 25912 t~8 x~6 y~2 z~2 + 2064 x~8 y~2 z~2 + 

9016 t~2 x"8 y~2 z~2 + 38552 t~4 x~8 y~2 z~2 + 

63192 t~6 x~8 y~2 z~2 + 31592 t~8 x~8 y~2 z~2 + 936 x~10 y~2 z~2 + 
5248 t~2 x~10 y~2 z~2 + 29072 t~4 x~10 y~2 z~2 + 

50464 t~6 x~10 y~2 z~2 + 25704 t“8 x~10 y~2 z~2 + 216 x~12 y~2 z~2 + 
1872 t~2 x~12 y~2 z~2 + 14192 t~4 x~12 y~2 z~2 + 

26056 t~6 x~12 y~2 z~2 + 13520 t~8 x~12 y~2 z~2 + 

264 t~2 x~14 y~2 z~2 + 3896 t"4 x~14 y~2 z~2 + 

7808 t~6 x“14 y~2 z~2 + 4176 t~8 x“14 y~2 z~2 + 

448 t~4 x~16 y~2 z~2 + 1024 t~6 x~16 y~2 z~2 + 

576 t~8 x~16 y~2 z~2 + 480 y~4 z"2 + 1632 t~2 y~4 z~2 + 

2208 t~4 y~4 z~2 + 1440 t~6 y~4 z~2 + 384 t~8 y~4 z~2 + 

1632 x~2 y~4 z~2 + 6528 t~2 x~2 y~4 z~2 + 10688 t~4 x"2 y~4 z~2 + 

8320 t~6 x~2 y~4 z~2 + 2528 t~8 x~2 y~4 z~2 + 3240 x~4 y~4 z~2 + 

14280 t~2 x~4 y~4 z~2 + 27448 t~4 x~4 y~4 z~2 + 

25048 t~6 x~4 y~4 z~2 + 8640 t~8 x~4 y"4 z"2 + 3936 x~6 y"4 z~2 + 
19992 t~2 x~6 y~4 z~2 + 46552 t~4 x~6 y~4 z~2 + 

49352 t~6 x~6 y~4 z~2 + 18856 t~8 x~6 y~4 z~2 + 3198 x~8 y~4 z~2 + 

19518 t~2 x~8 y~4 z~2 + 55218 t~4 x~8 y~4 z~2 + 

66170 t~6 x~8 y~4 z~2 + 27272 t~8 x~8 y~4 z~2 + 1836 x~10 y~4 z~2 + 

13332 t~2 x~10 y~4 z~2 + 44988 t~4 x~10 y~4 z~2 + 

59580 t~6 x~10 y~4 z~2 + 26088 t~8 x~10 y~4 z~2 + 486 x~12 y~4 z~2 + 

5214 t~2 x~12 y~4 z~2 + 22994 t~4 x~12 y~4 z~2 + 
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34194 t~6 x~12 y~4 z~2 + 15928 t~8 x~12 y~4 z~2 + 

792 t~2 x~14 y~4 z~2 + 6312 t~4 x~14 y~4 z~2 + 

11136 t~6 x~14 y~4 z~2 + 5616 t~8 x~14 y~4 z~2 + 

672 t~4 x~16 y~4 z~2 + 1536 t~6 x~16 y~4 z~2 + 

864 t~8 x~16 y~4 z~2 + 720 x~4 y~6 z~2 + 2480 t~2 x~4 y~6 z~2 + 

3472 t~4 x~4 y~6 z~2 + 2384 t~6 x~4 y~6 z~2 + 672 t~8 x~4 y~6 z~2 + 

1728 x~6 y~6 z~2 + 7440 t~2 x~6 y~6 z~2 + 13072 t~4 x~6 y~6 z~2 + 
10736 t~6 x~6 y~6 z~2 + 3376 t~8 x~6 y~6 z~2 + 2268 x~8 y~6 z~2 + 
11484 t~2 x~8 y~6 z~2 + 23812 t~4 x~8 y~6 z~2 + 

22276 t~6 x~8 y~6 z~2 + 7680 t~8 x~8 y~6 z~2 + 1800 x~10 y~6 z~2 + 
10568 t~2 x~10 y~6 z~2 + 25560 t~4 x~10 y~6 z~2 + 

26872 t~6 x~10 y~6 z~2 + 10080 t~8 x~10 y~6 z~2 + 540 x~12 y~6 z~2 + 

4836 t~2 x~12 y~6 z~2 + 15420 t~4 x~12 y~6 z~2 + 

18964 t~6 x~12 y~6 z~2 + 7840 t~8 x~12 y~6 z~2 + 

792 t~2 x~14 y~6 z~2 + 4520 t~4 x~14 y~6 z~2 + 

7040 t~6 x~14 y~6 z~2 + 3312 t~8 x~14 y~6 z~2 + 

448 t~4 x~16 y~6 z~2 + 1024 t~6 x~16 y~6 z~2 + 

576 t~8 x~16 y~6 z~2 + 360 x~8 y~8 z~2 + 1224 t~2 x~8 y~8 z~2 + 

1656 t~4 x~8 y~8 z~2 + 1080 t~6 x~8 y~8 z~2 + 288 t~8 x~8 y~8 z~2 + 

576 x~10 y~8 z~2 + 2448 t~2 x~10 y~8 z~2 + 4176 t~4 x~10 y~8 z~2 + 

3312 t~6 x~10 y~8 z~2 + 1008 t~8 x~10 y~8 z~2 + 216 x~12 y~8 z~2 + 

1488 t~2 x~12 y~8 z~2 + 3616 t~4 x~12 y~8 z~2 + 

3640 t~6 x~12 y~8 z~2 + 1296 t~8 x~12 y~8 z~2 + 

264 t~2 x~14 y“8 z~2 + 1208 t~4 x“14 y“8 z'2 + 

1664 t~6 x~14 y~8 z~2 + 720 t~8 x~14 y~8 z~2 + 

112 t~4 x~16 y~8 z~2 + 256 t~6 x~16 y~8 z~2 + 144 t~8 x~16 y~8 z~2 + 
128 z~4 + 288 t~2 z~4 + 352 t~4 z~4 + 384 t~6 z~4 + 256 t~8 z~4 + 

352 x~2 z~4 + 1056 t~2 x~2 z~4 + 1408 t~4 x~2 z~4 + 

1952 t~6 x~2 z~4 + 1504 t~8 x~2 z~4 + 764 x~4 z~4 + 

2104 t~2 x"4 z~4 + 2616 t~4 x~4 z~4 + 5016 t~6 x~4 z~4 + 

4252 t~8 x~4 z~4 + 804 x~6 z~4 + 1912 t~2 x~6 z~4 + 

2920 t~4 x~6 z~4 + 8536 t~6 x~6 z~4 + 7364 t~8 x~6 z~4 + 

471 x~8 z~4 + 898 t~2 x"8 z~4 + 2694 t~4 x"8 z~4 + 

10058 t~6 x~8 z~4 + 8335 t~8 x~8 z~4 + 162 x~10 z~4 + 

252 t~2 x~10 z~4 + 2164 t~4 x~10 z~4 + 7980 t~6 x~10 z~4 + 

6226 t~8 x"10 z~4 + 27 x~12 z~4 + 42 t~2 x~12 z~4 + 

1182 t~4 x~12 z~4 + 4018 t~6 x~12 z~4 + 2979 t~8 x~12 z~4 + 

352 t~4 x~14 z~4 + 1152 t~6 x~14 z~4 + 832 t~8 x~14 z~4 + 

44 t~4 x~16 z~4 + 144 t“6 x~16 z~4 + 104 t~8 x~16 z~4 + 

784 y“2 z~4 + 1888 t~2 y“2 z~4 + 2208 t~4 y"2 z~4 + 

1888 t~6 y~2 z~4 + 784 t~8 y~2 z~4 + 2080 x~2 y~2 z~4 + 

5600 t~2 x~2 y~2 z~4 + 8832 t~4 x~2 y~2 z~4 + 9952 t~6 x~2 y~2 z~4 + 

4640 t~8 x~2 y~2 z~4 + 3368 x~4 y~2 z~4 + 9440 t~2 x~4 y~2 z~4 + 

18928 t~4 x~4 y~2 z~4 + 25952 t~6 x~4 y~2 z~4 + 

13224 t~8 x~4 y~2 z~4 + 2840 x~6 y~2 z~4 + 9056 t~2 x"6 y~2 z~4 + 

25872 t~4 x~6 y~2 z~4 + 42464 t~6 x~6 y~2 z~4 + 

23192 t~8 x~6 y~2 z~4 + 1524 x~8 y~2 z~4 + 6072 t~2 x"8 y~2 z~4 + 

25016 t~4 x~8 y~2 z~4 + 46792 t~6 x~8 y~2 z~4 + 

26900 t~8 x~8 y~2 z~4 + 576 x~10 y~2 z~4 + 3184 t~2 x~10 y~2 z~4 + 

17216 t~4 x~10 y~2 z~4 + 35024 t“6 x~10 y~2 z~4 + 

20928 t~8 x~10 y~2 z~4 + 108 x~12 y~2 z~4 + 1008 t~2 x~12 y~2 z~4 + 
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7584 t~4 x~12 y~2 z~4 + 16968 t~6 x~12 y~2 z~4 + 

10572 t~8 x~12 y~2 z~4 + 120 t~2 x~14 y~2 z~4 + 

1816 t~4 x~14 y~2 z~4 + 4736 t~6 x~14 y~2 z~4 + 

3136 t~8 x~14 y~2 z~4 + 176 t~4 x~16 y~2 z~4 + 

576 t~6 x~16 y~2 z~4 + 416 t~8 x~16 y~2 z~4 + 624 y~4 
2208 t~2 y~4 z~4 + 3168 t~4 y~4 z~4 + 2208 t~6 y~4 z~4 
624 t~8 y~4 z~4 + 1600 x~2 y~4 z~4 + 6976 t~2 x~2 y~4 
12672 t~4 x~2 y~4 z~4 + 10816 t~6 x~2 y~4 z~4 + 

3520 t~8 x~2 y~4 z~4 + 3364 x~4 y~4 z~4 + 14456 t~2 x~ 
29416 t~4 x~4 y~4 z~4 + 29016 t~6 x~4 y~4 z~4 + 

10692 t~8 x~4 y~4 z~4 + 3452 x~6 y~4 z~4 + 17336 t~2 x 

43896 t~4 x~6 y~4 z~4 + 51032 t~6 x~6 y~4 z~4 + 

21020 t~8 x~6 y~4 z~4 + 2495 x~8 y~4 z~4 + 14658 t~2 x 

45814 t~4 x~8 y~4 z~4 + 61162 t~6 x~8 y~4 z~4 + 

27607 t~8 x~8 y~4 z~4 + 1242 x~10 y~4 z~4 + 8892 t~2 x 

33252 t~4 x~10 y~4 z~4 + 49644 t~6 x~10 y~4 z~4 + 

24234 t~8 x~10 y~4 z~4 + 243 x~12 y~4 z~4 + 2914 t~2 x 
14758 t~4 x~12 y~4 z~4 + 25538 t~6 x~12 y~4 z~4 + 

13643 t~8 x~12 y~4 z~4 + 360 t~2 x~14 y~4 z~4 + 

3336 t~4 x~14 y~4 z~4 + 7296 t~6 x~14 y~4 z~4 + 

4416 t~8 x~14 y~4 z~4 + 264 t~4 x~16 y~4 z~4 + 

864 t~6 x~16 y~4 z~4 + 624 t~8 x~16 y~4 z~4 + 952 x~4 
3472 t~2 x~4 y~6 z~4 + 5232 t~4 x~4 y~6 z~4 + 3856 t~6 
1144 t~8 x~4 y~6 z~4 + 1544 x~6 y~6 z~4 + 7760 t~2 x~6 
15696 t~4 x~6 y~6 z~4 + 14288 t~6 x~6 y~6 z~4 + 

4808 t~8 x"6 y~6 z~4 + 1942 x"8 y~6 z~4 + 10532 t~2 x~ 
24556 t~4 x~8 y~6 z~4 + 25380 t~6 x~8 y~6 z~4 + 

9414 t~8 x"8 y~6 z~4 + 1332 x~10 y~6 z~4 + 8408 t~2 x~ 
22952 t~4 x~10 y~6 z~4 + 26776 t“6 x~10 y~6 z~4 + 

10900 t~8 x~10 y~6 z~4 + 270 x~12 y~6 z~4 + 2972 t~2 x 
11492 t~4 x~12 y~6 z~4 + 16244 t“6 x~12 y~6 z~4 + 

7486 t~8 x~12 y~6 z~4 + 360 t~2 x~14 y"6 z~4 + 

2632 t~4 x~14 y~6 z~4 + 4992 t~6 x“14 y~6 z~4 + 

2752 t~8 x~14 y~6 z~4 + 176 t~4 x~16 y~6 z~4 + 

576 t~6 x~16 y~6 z~4 + 416 t~8 x“16 y~6 z~4 + 468 x"8 
1656 t~2 x"8 y~8 z"4 + 2376 t~4 x~8 y~8 z~4 + 1656 t~6 
468 t~8 x~8 y~8 z~4 + 504 x~10 y~8 z~4 + 2448 t~2 x“10 
4752 t~4 x"10 y~8 z~4 + 4176 t~6 x~10 y~8 z~4 + 

1368 t~8 x"10 y~8 z~4 + 108 x~12 y~8 z~4 + 1024 t~2 x~ 

3136 t~4 x~12 y~8 z~4 + 3656 t~6 x~12 y~8 z~4 + 

1436 t~8 x'12 y~8 z~4 + 120 t~2 x“14 y“8 z~4 + 

760 t~4 x~14 y~8 z~4 + 1280 t~6 x~14 y~8 z"4 + 

640 t~8 x~14 y~8 z~4 + 44 t~4 x~16 y~8 z~4 + 144 t~6 x 
104 t~8 x~16 y~8 z~4 + 256 z~6 + 320 t"2 z~6 + 384 t~4 
352 t~6 z~6 + 160 t~8 z~6 + 272 x~2 z~6 + 256 t~2 x"2 
1120 t~4 x~2 z~6 + 1408 t~6 x"2 z~6 + 784 t~8 x~2 z"6 
232 x~4 z~6 + 456 t~2 x~4 z~6 + 2104 t"4 x~4 z~6 + 

2712 t~6 x"4 z~6 + 1856 t~8 x"4 z~6 + 96 x~6 z~6 + 472 
2072 t~4 x"6 z~6 + 3208 t~6 x"6 z~6 + 2792 t~8 x"6 z~6 
24 x~8 z~6 + 298 t~2 x~8 z~6 + 1178 t~4 x~8 z~6 + 2686 
2870 t~8 x"8 z~6 + 108 t~2 x~10 z~6 + 396 t~4 x~10 z~6 


z~4 + 

+ 

z~4 + 

4 y~4 z~4 + 

~6 y~4 z~4 + 
~8 y~4 z~4 + 
~10 y~4 z~4 + 
~12 y~4 z~4 + 


y~6 z~4 + 
x~4 y~6 z~4 + 
y~6 z~4 + 

8 y~6 z~4 + 

10 y~6 z~4 + 

~12 y~6 z~4 + 


y~8 z~4 + 
x~8 y~8 z~4 + 
y~8 z~4 + 

12 y~8 z~4 + 


"16 y~8 z~4 + 
z~6 + 
z~6 + 


t~2 x~6 z~6 + 
+ 

t~6 x~8 z~6 + 
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1668 t~6 x~10 z~6 + 2020 t~8 x~10 z~6 + 18 t~2 x~12 z~6 
66 t~4 x~12 z~6 + 726 t~6 x~12 z~6 + 934 t~8 x~12 z~6 + 
192 t~6 x~14 z~6 + 256 t~8 x~14 z~6 + 24 t~6 x~16 z~6 + 
32 t~8 x~16 z~6 + 704 y~2 z~6 + 1760 t~2 y~2 z~6 + 

1888 t~4 y~2 z~6 + 1312 t~6 y~2 z~6 + 480 t~8 y~2 z~6 + 

1136 x~2 y~2 z~6 + 3456 t~2 x~2 y~2 z~6 + 5152 t~4 x~2 y 

5248 t~6 x~2 y~2 z~6 + 2416 t~8 x~2 y~2 z~6 + 1768 x~4 y 

5200 t~2 x~4 y~2 z~6 + 9152 t~4 x~4 y~2 z~6 + 

11696 t~6 x~4 y~2 z~6 + 6232 t~8 x~4 y“2 z~6 + 1144 x“6 
3760 t~2 x~6 y~2 z~6 + 9984 t~4 x~6 y~2 z~6 + 

16720 t~6 x~6 y~2 z~6 + 10120 t~8 x~6 y~2 z~6 + 456 x~8 
1752 t~2 x~8 y~2 z~6 + 7592 t~4 x~8 y~2 z~6 + 

16024 t~6 x~8 y~2 z~6 + 10880 t~8 x~8 y~2 z~6 + 72 x~10 
544 t~2 x~10 y~2 z~6 + 3952 t~4 x~10 y~2 z~6 + 

10304 t~6 x~10 y~2 z~6 + 7848 t~8 x~10 y~2 z~6 + 

72 t~2 x~12 y~2 z~6 + 1160 t~4 x“12 y~2 z~6 + 

4192 t~6 x~12 y~2 z~6 + 3680 t~8 x~12 y~2 z~6 + 

128 t~4 x~14 y~2 z~6 + 952 t~6 x“14 y~2 z~6 + 

1016 t~8 x“14 y~2 z~6 + 96 t~6 x“16 y~2 z~6 + 128 t~8 x~ 
384 y“4 z~6 + 1440 t~2 y~4 z~6 + 2208 t~4 y~4 z~6 + 

1632 t~6 y~4 z~6 + 480 t~8 y~4 z~6 + 608 x~2 y~4 z~6 + 
3200 t~2 x~2 y~4 z~6 + 6848 t~4 x~2 y~4 z~6 + 6528 t~6 x 
2272 t~8 x~2 y“4 z~6 + 1760 x~4 y~4 z~6 + 7128 t~2 x“4 y 
15128 t~4 x~4 y~4 z~6 + 16008 t~6 x~4 y~4 z~6 + 

6248 t~8 x~4 y~4 z~6 + 1288 x~6 y~4 z~6 + 6856 t~2 x~6 y 
19576 t~4 x~6 y~4 z~6 + 25176 t~6 x~6 y~4 z~6 + 

11168 t~8 x~6 y~4 z"6 + 832 x~8 y~4 z~6 + 4730 t~2 x~8 y 

17242 t~4 x~8 y~4 z~6 + 26382 t~6 x~8 y~4 z~6 + 

13230 t~8 x~8 y~4 z~6 + 216 x~10 y~4 z~6 + 1980 t~2 x“10 

10092 t~4 x~10 y~4 z~6 + 18420 t~6 x~10 y~4 z~6 + 

10476 t~8 x~10 y~4 z~6 + 274 t~2 x~12 y~4 z~6 + 

3186 t~4 x~12 y~4 z~6 + 7806 t~6 x~12 y~4 z~6 + 

5278 t~8 x~12 y~4 z~6 + 384 t~4 x~14 y~4 z"6 + 

1704 t~6 x~14 y~4 z~6 + 1512 t~8 x“14 y~4 z~6 + 

144 t~6 x~16 y~4 z~6 + 192 t~8 x“16 y~4 z~6 + 608 x"4 y~ 

2384 t~2 x"4 y~6 z~6 + 3856 t~4 x~4 y~6 z~6 + 2992 t~6 x 

912 t~8 x~4 y~6 z~6 + 496 x~6 y~6 z~6 + 3568 t~2 x~6 y~6 
8848 t~4 x"6 y~6 z~6 + 8976 t~6 x~6 y~6 z~6 + 3200 t~8 x 
752 x~8 y~6 z~6 + 4356 t~2 x~8 y~6 z~6 + 11780 t~4 x~8 y 
13596 t~6 x~8 y~6 z~6 + 5420 t~8 x~8 y"6 z~6 + 288 x~10 
2552 t~2 x~10 y~6 z~6 + 8984 t~4 x~10 y~6 z~6 + 

12232 t~6 x~10 y~6 z~6 + 5512 t~8 x~10 y~6 z~6 + 

404 t"2 x~12 y~6 z~6 + 3156 t~4 x~12 y~6 z~6 + 

5940 t~6 x~12 y~6 z~6 + 3252 t~8 x~12 y~6 z~6 + 

384 t~4 x~14 y~6 z~6 + 1320 t~6 x~14 y~6 z~6 + 

1000 t~8 x~14 y~6 z~6 + 96 t~6 x“16 y~6 z~6 + 128 t~8 x~ 

288 x~8 y"8 z"6 + 1080 t~2 x~8 y"8 z~6 + 1656 t~4 x"8 y~ 

1224 t~6 x"8 y~8 z~6 + 360 t~8 x"8 y~8 z~6 + 144 x~10 y~ 
1008 t~2 x"10 y~8 z"6 + 2448 t~4 x~10 y~8 z~6 + 

2448 t~6 x"10 y~8 z~6 + 864 t"8 x"10 y~8 z~6 + 

184 t~2 x~12 y~8 z~6 + 1064 t~4 x~12 y"8 z~6 + 


~2 z~6 + 
~2 z~6 + 

y~2 z~6 + 

y~2 z~6 + 

y~2 z~6 + 


16 y~2 z~6 + 


~2 y~4 z~6 + 
~4 z~6 + 

~4 z~6 + 

~4 z~6 + 

y~4 z~6 + 


6 z~6 + 

“4 y~6 z~6 + 
z~6 + 

"6 y~6 z~6 + 
"6 z~6 + 
y~6 z~6 + 


16 y~6 z~6 + 
8 z~6 + 

8 z~6 + 
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1600 t~6 x~12 y~8 z~6 + 720 t~8 x~12 y~8 z~6 + 

128 t~4 x~14 y~8 z~6 + 376 t~6 x~14 y~8 z~6 + 248 t~8 x~14 y~8 z~6 + 
24 t~6 x~16 y~8 z~6 + 32 t~8 x~16 y~8 z~6 + 176 z~8 + 256 t~2 z~8 + 
256 t~4 z~8 + 160 t~6 z~8 + 48 t~8 z~8 + 256 x~2 z~8 + 

240 t~2 x~2 z~8 + 544 t~4 x~2 z~8 + 496 t~6 x~2 z~8 + 

192 t~8 x~2 z~8 + 224 x~4 z~8 + 152 t~2 x~4 z~8 + 892 t~4 x~4 z~8 + 

848 t~6 x~4 z~8 + 396 t~8 x~4 z~8 + 96 x~6 z~8 + 32 t~2 x~6 z~8 + 

900 t~4 x~6 z~8 + 840 t~6 x~6 z~8 + 516 t~8 x~6 z~8 + 24 x~8 z~8 + 

8 t~2 x~8 z~8 + 575 t~4 x~8 z~8 + 510 t~6 x~8 z~8 + 

463 t~8 x~8 z~8 + 210 t~4 x~10 z~8 + 180 t~6 x~10 z~8 + 

290 t~8 x~10 z~8 + 35 t~4 x~12 z~8 + 30 t~6 x~12 z~8 + 

123 t~8 x~12 z~8 + 32 t~8 x~14 z~8 + 4 t~8 x~16 z~8 + 256 y~2 z~8 + 

704 t~2 y~2 z~8 + 784 t~4 y~2 z~8 + 480 t~6 y~2 z~8 + 

144 t~8 y~2 z~8 + 256 x~2 y~2 z~8 + 1040 t~2 x~2 y~2 z~8 + 

1632 t~4 x~2 y~2 z~8 + 1424 t~6 x~2 y~2 z~8 + 576 t~8 x~2 y~2 z~8 + 

416 x~4 y~2 z~8 + 1560 t~2 x~4 y~2 z~8 + 2696 t~4 x~4 y~2 z~8 + 

2760 t~6 x~4 y~2 z~8 + 1336 t~8 x~4 y~2 z~8 + 224 x~6 y~2 z~8 + 

1032 t~2 x~6 y~2 z~8 + 2616 t~4 x~6 y~2 z~8 + 3416 t~6 x~6 y~2 z~8 + 

1992 t~8 x~6 y~2 z~8 + 96 x~8 y~2 z~8 + 472 t~2 x~8 y~2 z~8 + 

1780 t~4 x~8 y~2 z~8 + 2800 t~6 x~8 y~2 z~8 + 1972 t~8 x~8 y~2 z~8 + 

88 t~2 x~10 y~2 z~8 + 736 t~4 x~10 y~2 z~8 + 1432 t~6 x~10 y~2 z~8 + 
1296 t~8 x~10 y~2 z~8 + 140 t~4 x~12 y~2 z~8 + 

400 t~6 x~12 y~2 z~8 + 548 t"8 x"12 y~2 z~8 + 40 t~6 x~14 y~2 z~8 + 
136 t~8 x~14 y~2 z~8 + 16 t~8 x~16 y~2 z~8 + 96 y~4 z~8 + 

384 t~2 y~4 z"8 + 624 t~4 y~4 z~8 + 480 t~6 y~4 z~8 + 

144 t~8 y~4 z~8 + 64 x~2 y~4 z~8 + 544 t~2 x~2 y~4 z~8 + 

1472 t~4 x"2 y~4 z~8 + 1568 t"6 x~2 y~4 z~8 + 576 t~8 x~2 y~4 z~8 + 

448 x~4 y"4 z~8 + 1696 t~2 x~4 y~4 z~8 + 3524 t~4 x~4 y~4 z~8 + 

3784 t~6 x"4 y~4 z~8 + 1508 t"8 x~4 y~4 z~8 + 224 x"6 y~4 z~8 + 

1400 t~2 x~6 y~4 z~8 + 4156 t~4 x~6 y~4 z~8 + 5488 t~6 x~6 y~4 z~8 + 

2508 t~8 x~6 y~4 z~8 + 176 x~8 y"4 z~8 + 992 t~2 x~8 y~4 z~8 + 

3367 t~4 x~8 y~4 z~8 + 5190 t~6 x~8 y~4 z~8 + 2735 t~8 x"8 y~4 z~8 + 

264 t~2 x~10 y~4 z~8 + 1578 t~4 x~10 y~4 z"8 + 

3084 t~6 x~10 y~4 z~8 + 1962 t~8 x~10 y~4 z~8 + 

315 t~4 x~12 y~4 z~8 + 998 t~6 x"12 y~4 z~8 + 875 t"8 x~12 y~4 z~8 + 

120 t~6 x~14 y~4 z~8 + 216 t~8 x"14 y~4 z~8 + 24 t~8 x~16 y~4 z~8 + 

160 x~4 y~6 z~8 + 672 t~2 x~4 y~6 z~8 + 1144 t~4 x~4 y~6 z~8 + 

912 t~6 x~4 y"6 z~8 + 280 t~8 x~4 y~6 z~8 + 32 x~6 y~6 z~8 + 

656 t~2 x~6 y~6 z~8 + 2056 t~4 x~6 y~6 z~8 + 2272 t~6 x~6 y~6 z~8 + 
840 t~8 x~6 y"6 z~8 + 160 x~8 y~6 z~8 + 880 t~2 x~8 y~6 z~8 + 

2534 t~4 x"8 y~6 z~8 + 3100 t~6 x~8 y~6 z~8 + 1286 t~8 x"8 y~6 z~8 + 
320 t~2 x~10 y~6 z~8 + 1556 t~4 x~10 y~6 z"8 + 

2408 t~6 x~10 y~6 z~8 + 1172 t~8 x~10 y~6 z~8 + 

350 t~4 x~12 y~6 z~8 + 916 t~6 x~12 y~6 z~8 + 598 t~8 x~12 y~6 z~8 + 

120 t~6 x~14 y~6 z~8 + 152 t~8 x"14 y~6 z~8 + 16 t~8 x~16 y~6 z~8 + 

72 x"8 y~8 z~8 + 288 t~2 x~8 y~8 z~8 + 468 t~4 x"8 y~8 z~8 + 

360 t~6 x~8 y"8 z~8 + 108 t~8 x~8 y~8 z~8 + 144 t~2 x~10 y~8 z~8 + 
504 t~4 x~10 y~8 z~8 + 576 t~6 x"10 y~8 z~8 + 216 t~8 x~10 y~8 z~8 + 

140 t~4 x~12 y~8 z~8 + 288 t~6 x"12 y~8 z~8 + 148 t~8 x~12 y~8 z~8 + 

40 t~6 x~14 y~8 z~8 + 40 t~8 x~14 y~8 z~8 + 4 t~8 x"16 y~8 z~8 
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